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CONCEPTS  AND  METHODS  IN  MULTI-PERSON 
COORDINATION  AND  CONTROL* 


TAMER  BA$AR  and  JOSE  B.  CRUZ,  JR. 
Department  of  Electrical  Engineering  and 
Coordinated  Science  Laboratory 
University  of  Illinois 
1101  W.  Springfield  Avenue 
Urbana,  Illinois  61801  USA 


Abstract 

In  this  chapter  we  discuss  some  key  concepts  and  methods  relevant 
to  multi-person  decision-making  and  optimization  in  deterministic  and  sto¬ 
chastic  dynamic  systems.  Specifically,  we  consider  systems  defined  in 
discrete-time,  and  treat  the  team,  Nash  and  Stackelberg  (leader-follower) 
solution  concepts  under  different  information  structures.  We  provide  an 
up-to-date  survey  of  the  literature  on  these  topics,  and  also  present  some 
new  results. 


*Thls  paper  will  appear  as  a  chapter  of  the  book  "Optimization  and 
Control  of  Dynamic  Operational  Research  Models,"  edited  by  S.  G.  Tzafestas, 
North  Holland,  1982. 


1.  INTRODUCTION 


Much  of  decision  and  control  theory  is  concerned  with  a  single 
decision-maker  with  a  single  objective  function.  Multiple  objective  functions 
have  been  considered  also,  but  usually  these  are  associated  with  a  single 
decision-maker.  Large  scale  systems  and  dynamic  operations  research  models 
are  likely  to  have  a  multiplicity  of  decision-makers.  Each  decision-maker 
may  have  multiple  objectives.  Even  when  each  decision-maker  has  only  one 
objective  function,  the  optimization  problem  is  significantly  much  more  complex 
than  that  for  a  situation  with  only  one  decision-maker. 

This  chapter  provides  a  discussion  of  some  of  the  key  concepts  and 
methods  that  are  appropriate  to  multiperson  decision-making.  When  two  or 
more  decision-makers  have  separate  objective  functions,  it  is  generally  not 
possible  to  simultaneously  optimize  all  the  objective  functions.  One 
important  exception  is  the  case  when  all  the  objective  functions  are  the 
same.  Even  in  this  case,  the  information  available  to  each  decision  maker 
may  be  different  from  those  available  to  others,  and  the  problem  of  determining 
the  mapping  from  the  Information  space  to  the  decision  space  for  each 
decision-maker  is  more  complex  than  that  for  a  central  decision-maker. 

When  cooperation  among  the  decision-makers  can  be  expected,  an 
appropriate  solution  concept  is  that  of  Pareto-optimality .  Otherwise,  a 
natural  concept  is  that  of  Nash  equilibrium.  In  situations  where  a  hier¬ 
archical  decision  structure  is  relevant,  the  Stackelberg  or  leader-follower 
concept  is  useful.  These  concepts  will  be  discussed  in  both  a  deterministic 
and  a  stochastic  setting. 


In  Section  2  we  set  the  stage  by  providing  motivational  examples, 
modeling  the  multiperson  decision  problem,  and  defining  the  various  solution 
concepts.  In  Section  3  we  develop  the  concepts  and  methods  appropriate  for 
multiperson  decision  problems  in  deterministic  systems  and  deterministic 
operations  research  models.  The  stochastic  decision  problem  is  formulated 
and  treated  in  detail  in  Section  4.  Section  5  briefly  describes  some  exam¬ 
ples,  and  Section  6  includes  some  concluding  remarks.  An  extensive  bibliog¬ 
raphy  is  included  at  the  end  of  the  chapter. 


2.  MULTI-PERSON  DECISION  PROBLEMS 


In  this  section  we  provide  a  general  discussion  on  the  formulation 
of  multi-person,  and  possibly  multicriteria,  coordination  and  control  problems 
that  involve  uncertainty,  informational  decentralisation  and  possible  conflicts 
of  Interests  (among  the  decision  makers) .  We  also  discuss  possible  solution 
concepts  for  such  decision  problems.  Before  going  into  a  formal  presentation, 
let  us  first  consider  a  few  examples  (in  Section  2.1)  to  motivate  the  general 
formulation  in  the  sequel. 


2.1.  Examples  for  Motivation 

a)  Optimum  resource  allocation  under  uncertainty 

Consider  a  firm  with  (for  simplicity)  two  divisions.  The  upper- 
level  division  (the  headquarters)  has  the  task  of  coordinating  the  units  (of 
production)  at  the  lower-level  division,  under  incomplete  information  as 
regards  to  their  production  capabilities,  availability,  and  quantity  of 
resources,  etc.  Furthermore,  there  are  m  common  resources  which  are  to  be 
used  by  some  or  all  units  in  production,  and  therefore  the  headquarters  has 
to  allocate  these  to  the  units  in  accordance  with  their  needs.  The  units  may 
communicate  their  needs  to  the  headquarters;  and  based  on  this  information  and 
some  other  measurements,  the  headquarters  will  have  to  decide  on  the  optimal 
allocation  that  maximizes  the  profit  to  the  firm  (or  some  other  appropriate 
utility  function) .  One  other  task  of  the  headquarters  is  to  design  an 
incentive  scheme  for  remuneration  of  the  production  units,  which  will  induce 
each  such  unit  to  report  his  true  need  (i.e.  not  to  cheat  in  his  transmittal 
of  Information)  and  to  utilize  the  allocated  resources  most  efficiently  (so  as. 


say,  to  maximize  the  unit's  share  of  the  profit  of  the  firm).  An  optimum 
coordination  effort  on  the  part  of  the  headquarters  will  therefore  force  the 
units  to  behave  as  a  team,  even  though  the  units  may  have  their  somewhat 
different  objectives  (from  that  of  the  headquarters)  and  operate  under 
decentralized  information. 

This  problem  is  one  of  multi-person  coordination  and  control,  which 
exhibits  a  hierarchy  in  decision  making — the  coordinator  (headquarters)  being 
in  a  position  to  dictate  his  policy  on  the  other  decision-makers  (the  units  of 
the  lower-level  division) .  It  also  involves  incomplete  information,  uncertainty 
and  a  dynamic  decision  process  with  multi  criteria, 
b.  Arms  race  between  two  nations 

There  is  a  dynamic  model — known  as  Richardson's  arms  race  model  [117]- 
which  describes  qualitatively  the  armament  buildup  between  two  nations  and  in 
which  the  decision  variables  may  be  taken  as  the  rates  of  increases  or 
decreases  in  the  armament  levels.  In  making  its  decision  as  to  whether  to 
Increase  or  decrease  its  current  armament  level,  each  nation  will  have  to 
take  a  few  factors  into  account,  namely  (i)  the  current  armament  level  of  the 
other  nation,  (11)  the  economic  burden  associated  with  any  possible  increase 
in  the  current  armament  level,  (iii)  the  response  history  of  the  other  nation 
to  past  armament  policies,  and  (lv)  uncertainty  associated  with  all  this 
information.  Yet  another  factor  that  affects  the  decision  process  is  the 
nations'  grievances  and  hatreds  towards  their  "opponents".  The  objective  of 
each  nation  will  be  to  maximize  an  expected  utility  function  that  reflects  a 
tradeoff  between  expected  economic  prosperity  and  national  security. 

This  is  clearly  a  dynamic  decision  process  which  involves  two  decision 
makers  with  different  objectives  and  whose  decisions  are  intercoup led.  It 


involves  uncertainty,  incomplete  information  and  noncooperative  decision 
making. 

c.  Water  pollution  control 

There  are  M  chemical  plants,  located  on  the  shores  of  a  river,  whose 
waste  discharges  pour  directly  into  the  river  with  no  (or  very  little)  pollution 
treatment.  The  municipality  decides  to  take  measures  against  this,  either 
through  a  subsidy  program  or  by  penalizing  those  who  do  not  properly  treat 
their  waste  discharge.  Assuming  that  the  municipality  is  in  a  position  to 
collect  data  from  the  river,  the  question  is  what  type  of  a  subsidy  (or 
penalty)  program  to  adopt,  which  will  force  the  chemical  plants  to  treat  their 
waste  discharges  properly  so  that  the  pollution  content  of  the  river  is  below 
certain  preset  limits  which  become  more  stringent  over  the  years.  This  is  a 
dynamic  multi-person  decision  problem  which  involves  uncertainty  and  multi 
criteria.  There  is  a  conflict  of  interest  between  the  municipality  and  the 
chemical  plants,  and  there  may  also  be  some  conflicts  of  lntersts  between  the 
Individual  plants . 

2.2.  A  General  Formulation 

A  general  formulation  of  a  multi-person  decision  problem  requires 
delineation  of  the  following  Information: 

(i)  A  set  of  decision  makers  (DMs) ,  or  the  so  called  agents .  Denote 
this  set  by  M*{1,2,..,M}  and  a  typical  element  by  m. 

(ii)  An  underlying  probability  space  (ft, <8, £9  for  the  uncertainties, 
which  are  beyond  the  control  of  the  DMs. 

(ill)  The  length  of  the  horizon  on  which  the  decision  process  is 


defined.  Here  we  will  adopt  a  discrete-time  formulation  with  a  finite  horizon. 


and  denote  the  number  of  stages  by  N.  Let  a  typical  element  of  N«{1,..,N} 
be  denoted  by  n. 
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(iv)  A  set  of  possible  alternatives  (decisions)  for  DM  m  at  stage 

n,  to  be  denoted  by  I f*,  with  ums  U®  being  a  typical  element.  In  the  most 

n  n  n 

general  setting,  may  depend  on  the  present  and  past  decisions  of  the  other 

agents  (i.e.  it  may  not  be  rectangular);  but  here  we  will  assume  if1  to  be 

n 

rectangular  for  every  n€  N,  mS  M. 

(v)  A  mathematical  description  of  the  interaction  of  the  DMs 
within  the  system  and  among  themselves,  and  with  the  uncertain  states  of  the 


environment,  i.e.  specification  of  a  system  equation  of  the  type 

e  /  1  M  .  . 

Vl  *  Ea(V\ . VV 


(2.1) 


where  x  ,x  . , 6  X  (the  state  space) ,  and  8  denotes  the  uncertainty  affecting 
n  nrrx  n 

the  outcome  of  the  decisions  at  stage  n.  An  alternate  description  would  be 

specification  of  the  probability  distribution  of  conditioned  on  the  set 

1  M 

of  vectors  {x  ,u  , ...,u  }:  but  we  will  adopt  the  state-space  description  (2.1). 
n  n  n 

(vi)  An  Information  structure  for  each  DM,  which  characterizes  the 

precise  static  or  dynamic  information  gained  and  recalled  by  that  DM  at  each 

stage  of  the  decision  process.  Each  such  information  structure  will  generate 

an  appropriate  information  space  (say  Z™)  for  DM  m  at  stage  n.  In  the  case 

of  deterministic  information  patterns,  each  DM  will  have  access  to  some  or 

all  components  of  the  present  and  past  values  of  the  state  vector,  as  well 

as  to  the  past  control  values  of  some  of  the  other  DMs.  In  the  case  of 

stochastic  information  patterns ,  ms  will  1  ave  access  to  noise  corrupted 

measurements  of  the  state  vector,  say 

ib  ,m  /  .m. 

y  ■  h  (x  ,0  ) 

7n  n  n  n 


(2.2) 


for  DM  m  at  stage  n,  where  6  denotes  the  uncertainty  corrupting  the  measure- 

n 

ment.  Then,  the  information  available  to  DM  m  at  stage  n  (denoted  71°)  will 

n 

comprise  a  sub-collection  of  the  set  of  vectors  {y”;y^_^, . . . ,y^_^; • • • ;y^» • • • >y^ 

u1  . , . . .  , ; . . .  ;u?‘, . . .  ,u^}  .  If  all  these  vectors  take  values  in  finite- 

n— i  n— i  1  1 

dimensional  spaces,  then  the  information  space  Zm  will  also  be  finite 

tl 

dimensional.  [Further  discussion  will  be  devoted  to  this  topic  in  the 

following  sections;  see  in  particular  Section  4.1.] 

(vii)  Permissible  strategies  (decision  laws)  for  each  DM,  defined 

as  appropriate  mappings  from  his  information  space  into  his  decision  space. 

Let  {y?,Yo, . . .  ,y™},  where  y”  !  is  a  measurable  mapping.  Wt  refer 

1  Z  N  n  n  n 

to  irm  as  a  strategy  (decision  law,  control  law)  of  DM  m,  and  denote  the  class 

of  all  permissible  strategies  for  DM  m  by  if*.  Each  permissible  sub-strategy 

Y™  will  be  assumed  to  belong  to  a  sub-strategy  set  rm  which  will  have  to  be 
n  n 

appropriately  aefined  for  the  problem  under  consideration. 

Permissible  strategies,  as  introduced  above,  are  also  known  as  pure 

strategies,  as  opposed  to  mixed  strategies  which  are  defined  as  probability 
N  m 

measures  on  X  r ,  or  behavioral  strategies  which  are  defined  as  independent 
n*l  n 

probability  measures  on  rm,  n€N.  In  the  sequel  we  will  deal  only  with  pure 

n 

strategies  and  refer  to  them  simply  as  strategies. 

(vlil)  An  objective  functional  for  each  DM,  that  summarizes 
(mathematically)  his  preference  ordering  among  different  alternatives  and  for 
each  fixed  permissible  strategy  of  the  remaining  DMs.  Hence,  we  assume 
existence  of  a  real-valued  function  :  IT  *  rf  *  •  •  *x  n  R»  for  each  m€  M,  which 
DM  m  strives  to  optimize  (say  minimize)  by  his  choice  of  strategy  irmenm- 
Note  that  the  effect  of  uncertainty  (if  any)  is  absorbed  n  this  formulation 
through  a  possible  expected  utility  approach.  This  point  will  be  further 
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discussed  in  Section  4,  where  a  more  precise  description  for  a  subclass  of 
problems  can  be  found. 

We  should  note  in  passing  that,  in  the  class  of  problems  described 
above,  the  order  in  which  each  agent  acts  is  predetermined;  there  exist  more 
general  formulations,  however,  [see,  Witsenhausen  (1971a)]  which  would  allow  for 
the  order  of  action  to  be  determined  by  a  chance  mechanism  (which  is  a  part  of 
the  uncertainty)  and  the  past  actions  of  the  agents.  We  do  not  discuss  such 
generalizations  here. 


2.3.  Solution  Concepts 

The  general  formulation  of  Section  2.2  is  not  complete  unless  we 
specify  the  precise  mode  of  decision  making  among  the  agents.  Even  though  each 
agent  will  attempt  to  minimize  his  corresponding  objective  functional,  this 
goal  cannot  certainly  be  achieved  independently  of  the  decisions  of  the  other 
agents,  unless  the  objective  functional  of  that  DM  happens  to  be  independent  of 
all  the  other  DMs '  strategies.  Hence,  in  order  to  complete  the  formulation  of 
a  multi-person  decision  problem,  we  have  to  introduce  rational  modes  of  decision 
making.  Some  selective  possibilities  are  discussed  in  the  sequel. 

Team  solution 

When  all  agents  have  a  common  goal,  we  have  a  team  problem  with  a 

X  2  m 

single  objective  functional  J  =  J  =  J  =  ...  =  J^,  and  then  an  optimum  (team) 

.  2.*  M* 

solution  ir  ■  {?  ,  ...,ir  )  is  defined  by 

J(ir*)  <  J(tt)  ,  VffSlI  (2.3) 

where  we  use  the  notation  ir€  H  to  denote  {irm€  H°,  m€M). 
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In  this  context,  a  solution  concept  that  is  somewhat  weaker  than  the 

team  solution  is  that  of  person-by-person  optimality.  Let  it  « 

m 

{ir^ . nm  . . .  ,tt^}  .  Then,  7t*e  H  is  person-by-person  optimal  if,  for 

all  meM, 

<w*)  <  J0(rr*,iTm) ,  7r°g  n®.  (2.4) 

m 

Note  that  every  optimum  team  solution  is  person-by-person  optimal,  but  not 
vice-versa . 


Pareto-optimal  solution 

When  the  agents  do  not  all  have  the  same  goal,  but  still  act 
cooperatively,  a  reasonable  equilibrium  concept  is  provided  by  the  Pareto- 
optimal  solution.  We  call  a  subset  II^CII  a  Pareto-optimal  set  if  there  exists 
no  element  in  IIp  which  is  dominated  by  a  strategy  from  n,  i.e.  there  does  not 
exist  irpG  n  and  tteH  with  the  property 


and 


A*)  <  J™ (it  )  Vme  M 

P 

J^(tt)  <  J*(ir  )  for  at  least  one  i£M 


(2.5) 


In  other  words,  is  the  collection  of  nondominated  strategies  in  II. 

Any  element  of  the  Pareto-optimal  set  is  known  as  the  Pareto-optimal 
solution  for  the  problem  under  consideration,  which  is  in  general  not  unique. 
Under  certain  conditions  [see  DaCunha  and  Polak  (1967) ] ,  the  set  of  Pareto- 
optimal  solutions  can  be  obtained  by  considering  a  convex  combination  of  the 

As 


M  ,  jn 
Z  X  , 

m-1  ® 


0  <  X  <  1, 

m 


M 

Z  X 

m-1  ® 


1, 


r.-\ 

o.- 
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^  and  by  minimizing  J^(ir)  over  n,  and  for  fixed  X«  (X^,...,XM).  This  yields  a 


solution  which  is  parameterized  by  X,  which  generates  the  Pareto-optimal  set. 

It  should  be  noted  that  a  critical  assumption  in  Pareto-optimality 


jgj  is  cooperation.  Specifically,  if  ir*  is  a  Pareto-optimal  solution  adopted  by 
all  the  agents,  one  of  them,  say  the  m'th  one,  may  attain  a  better  performence 


by  minimizing 


IQ 


jn 


over  n  ;  but  he  has  to  refrain  from  adopting  this  policy  (under  the  cooperative 
mood  of  decision  making)  since  a  better  performance  for  one  DM  (at  a  Pareto 


n. 


solution  point)  necessarily  implies  a  worse  performance  for  some  other  DM. 


Nash  equilibrium  solution 

When  cooperation  cannot  be  enforced  in  a  multi-person  multi-criteria 

decision  problem,  a  solution  concept  that  safeguards  against  cheating  by  a 

single  1X1  is  the  Nash  equilibrium.  We  say  that  an  M-tuple  of  strategies 
*  ,  1*  m*. 

it  m  (it  ,..,ff  }  provides  a  Nash  equilibrium  solution  if,  for  all  m€M, 

A**)  <  A*!,irm),  ira6  A 


(2.6) 


Note  that,  for  the  special  case  when  J°,  m€M,  are  identical,  this  solution 
concept  coincides  with  person-by-person  optimality;  furthermore,  when 
M«{1,2},  and  J^s-J^Aj,  Ve  have  a  single  inequality 


J(*^*,r2)  <  J(ir*)  <  JO*1,*2*),  w1«n1,  it2€II2 


(2.7) 


which  is  known  as  the  saddle-point  Inequality  and  the  corresponding  equilibrium 
1  solution  is  known  as  a  saddle-point  solution.  This  latter  case  characterizes 
a  situation  in  which  the  two  DMs  have  completely  conflicting  goals. 
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Stackelberg  solution 


Consider  the  class  of  systems  with  two  agents  and  In  which  the  roles 
are  not  symmetric.  One  of  the  DMs,  known  as  the  leader,  is  in  a  position  to 
announce  his  strategy  ahead  of  time  and  enforce  it  on  the  other  DM,  known  as 
the  follower.  For  each  announced  strategy,  tt^SII^,  of  the  leader,  we  assume 
that  the  follower  acts  rationally  and  determines  his  response  by  minimizing 


jV1,*2) 


over  II  .  The  set  of  all  such  solutions 


RU1)  -  {ir2*€ll2  :  J2(ir1,Tr2*)  <  min  J2^1,*2) 

tr2^2 


(2.8) 


is  known  as  the  rational  response  (reaction)  set  of  the  follower.  In  case 
this  is  a  singleton,  we  have  the  unique  reaction  function  (mapping) 


1  2 
t  :  n  -*•  n  , 


(2.9a) 


so  that  the  leader  will  now  determine  his  equilibrium  strategy  by  minimizing 

.1,  1  „  1. 

J  (w  ,Ttt  ) 


1  1*  1 

over  n  .  Any  strategy  ir  €  II  with  the  property 

1  1*  1*  111  11 
JA(ir  ,Tir  )  <  j(tt  ,Tir) ,  VtTs  IT 


(2.9b) 


is  known  as  a  Stackelberg  strategy  for  the  leader.  Note  that  T  is  determined 


here  as  the  unique  mapping  satisfying 

.2 .  1  _  1.  .2,  1  2. 

J  (ir  ,Tir  )  <  J  (it  ,tt  ) , 


2  2 

vir  e  n 


(2.10) 


11  12  2*  1* 
for  every  ir  e  n  ,  and  with  the  property  Tit  €  II  .  The  strategy,  ir  -  Ttt  ,  for 

1* 

the  follower,  that  corresponds  to  ir  under  this  mapping,  is  known  as  the 
equilibrium  strategy  of  the  follower  under  the  Stackelberg  mode  of  decision 


making. 


If  R(tt  )  is  not  a  singleton,  there  is  no  unique  way  of  defining 
the  Stackelberg  solution.  One  possibility  is  for  the  leader  to  secure  his 


losses  against  nonunique  rational  responses  of  the  follower,  and  accordingly 
1*  1 

to  select  a  i  £  II  that  satisfies 


,1,1*  2 


1,1  2. 


sup  J  (ir  ,ir  )  <  sup  J  (it  ,ir  ), 


ff^6R(ffl*) 


ir^RCir1) 


(2.11) 


for  all  This,  we  shall  also  call  the  Stackelberg  strategy  for  the 


leader . 


It  is  also  possible  to  extend  the  Stackelberg  solution  concept  to 
systems  with  more  than  two  DMs  and  possibly  more  than  two  levels  of  hierarchy. 
In  this  extension,  if  there  is  more  than  one  DM  at  any  level  of  hierarchy, 
we  have  to  adopt  either  the  Pareto-optimality  or  the  Nash  solution  as  an 
equilibrium  concept  at  that  particular  level.  As  a  specific  case,  consider 
an  M-person  decision  problem  with  one  leader  and  M-l  followers,  and  two  levels 
of  hierarchy.  Suppose  that  there  is  no  cooperation  among  the  followers;  then 
we  adopt  the  Nash  solution  concept  at  the  lower  level  of  hierarchy  and  further 
assume  that  the  Nash  solution  is  unique  for  every  ir^£ll^.  Then,  there  exist 
M-l  reaction  functions  T*  i«2,3,...,M,  such  that 


Ji(ir1firJ,Tiir1)  <  Ji(ir1,ir^>iri) ,  /gn1,  i«2,3,...,M,  (2.12a) 


where 


t*  -  {T2ir1,T3x1f...,Ti“1ir1,T1+1ir1,...,TWir1}. 


(2.12b) 


A  Stackelberg  (hierarchical)  strategy  for  the  leader  in  this  decision  problem 
1* 

is  a  n  £  II  that  satisfies 


.1,  1*  _2  1*  -.Ml*  J,  1  J  1  _M  1. 

J  (if  ,T  IT  ,  •  •  •  ,T  IT  )  i  J  (TT  ,T  TT  ,,..,1  TT  ) 


(2.13) 


for  all  fX£  II1. 
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Decision  problems  chat  incorporate  a  hierarchy  in  decision  making 
are  also  known  as  coordination  problems ,  and  the  leader  is  referred  to  as 
the  coordinator,  since  presence  of  a  hierarchy  enables  him  to  coordinate 
the  actions  of  the  other  decision  makers .  This  is  particularly  true  if  the 
leader's  objective  function  comprises  a  convex  combination  of  the  objective 
functions  of  the  followers,  in  which  case  a  Stackelberg  strategy  may  force 
the  followers  to  a  Pareto-optimal  solution  even  though  they  will  be  acting 
noncooperatively .  Such  possibilities  will  be  discussed  in  the  sections  to 


follow . 


3.  COORDINATION  AND  CONTROL  IN  DETERMINISTIC  SYSTEMS 


In  this  section  we  discuss  coordination  and  control  problems  in  the 
context  of  deterministic  systems  and  under  deterministic  information  patterns. 
Firstly  we  identify  deterministic  problems  within  the  framework  of  the  formula¬ 
tion  of  §2.2  and  delineate  several  deterministic  information  patterns  (see  §3.1). 
Then,  we  provide  a  brief  discussion  on  team  and  Pareto-optimal  solutions  and 
representations  of  strategies  on  trajectories (in  §3.2),  discuss  Nash  equilibria 
(in  §3.3)  and  Stackelberg  solutions  (in  §3.4);  finally  we  discuss  general  coor¬ 
dination  and  control  problems  in  deterministic  systems. 


3.1  Deterministic  Systems  and  Deterministic  Information  Patterns 

The  class  of  deterministic  systems  to  be  considered  in  this  section 
will  be  a  special  case  of  the  general  formulation  of  §2.2,  obtained  by  taking 
all  probability  measures  to  be  one-point;  in  other  words,  we  take  the  state 
equation  to  be  given  by 


n+1 


fn(W- 


MN 

UJ» 


n+1 


€  R 


(3.1) 


with  the  value  of  x^,  the  initial  state,  specified  a  priori,  and  the  stage- 
additive  cost  function  to  be  given  as 


ml  m  ^  m. 

L  (u  ,..,u  )  -  2  g  (x 

n-1  n 


n+l,un'  ’  *  * 


M  \ 
u  ,x  ) 

n*  n 


(3.2) 


for  DMm. 

If  a  decision  maker  has  access  to  only  the  initial  value  of  the  state 
and  does  not  acquire  any  (dynamic)  Information  on  the  values  of  state  (or  con¬ 
trols)  at  other  stages,  we  say  that  he  has  open- loop  information.  If,  however, 
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he  acquires  perfect  information  concerning  the  current  values  of  the  state  and 
has  perfect  recall  on  the  past  acquired  information,  we  say  that  his  informa¬ 
tion  pattern  is  closed- loop  (with  memory) .  Hence,  in  the  former  case 

-  {^5,  n€N,  (3.3a) 


and  in  the  latter  case 


*  {xn,xn_1, . .  .x^,  n  €  N,  (3.3b) 

for  DMm,  and  these  two  information  structures  constitute  the  two  extreme  pos¬ 
sibilities  as  regards  deterministic  information  structures  that  involve  state 
measurements.  Two  important  cases  "in  between"  are  the  feedback  (or  closed- 
loop  no -memory)  information  structure  in  which  case  the  decision  maker  recalls 
only  the  current  value  of  the  state  (and  also  the  Initial  state,  which  is 
known  a  priori),  i.e., 

Tl”  *  txn»xi,}»  n  €  N,  ®  €  M,  (3.4) 


and  the  partial  closed-loop  Information  structure  in  which  case  the  dynamic 
state  information  that  the  decision  maker  acquires  and  recalls  is  only  partial, 
i.e. 


T)m  •  {ym,y°  .....  ,y®,x },  n  €  N,  n  ft  1.  m  €  M, 
n  n  n-i  i  l 


(3.5a) 


where 


and  h°  is 
n 


y"  -  *£(*„),  n  €  N,  n  ft  l,  a  6  M,  (3.5b) 

n  n  n 

an  appropriate  function  which  is  not  necessarily  one-to-one.  Note 
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Chat  in  the  partial  closed-loop  information  structure  each  decision  maker's 
current  observation  (or  measurement)  y™  may  be  different,  and  there  is  in 
general  no  sharing  of  information.  An  information  structure  which  permits 
such  sharing  is,  for  example. 


tyn’yn-l’yn-2’*-’y2,Xl3 


where 


y 


k 


A 


[yl  v2 
iyk’yk’**’ 


(3.6a) 


3.6b) 


which  is  known  as  the  one-step  delay  observation  sharing  pattern.  It  is, 
of  course,  possible  to  introduce  other  information  patterns  which  involve 
sharing  of  only  a  subset  of  past  observations  and  with  possibly  more  than 
one  stage  delay.  Each  such  information  structure  leads  to  an  appropriate 
strategy  space  for  each  decision  maker,  for  which  we  use  the  notation  already 
introduced  in  §2.2. 


3.2  Team  and  Pareto-optimal  Solutions 

When  all  agents  have  a  common  goal  (the  case  of  a  team  problem)  or 
have  different  goals  but  act  cooperatively  (the  case  of  Pareto-optimal  solu¬ 
tion),  the  optimum  solution  can  be  obtained  by  utilizing  techniques  of  optimal 
control  theory  since  in  the  former  case  there  is  a  single  objective  functional 
to  be  minimized  and  in  the  latter  case  one  may  in  general  consider  a  parame¬ 
terized  convex  combination  of  all  the  agents'  cost  functionals  as  a  single 
objective  functional  to  be  minimized,  whose  parameterized  solution  character¬ 
izes  the  Pareto-optimal  set.  Furthermore,  in  order  to  obtain  a  solution  under 


a  given  general  deterministic  information  pattern,  a  standard  approach  is 
first  to  obtain  the  minimizing  solution  under  the  open-loop  Information  struc¬ 
ture  and  then  to  synthesize  a  closed- loop  solution  as  a  representation  of  that 
open-loop  solution  in  the  strategy  spaces  compatible  with  the  given  dynamic 
information.  Before  discussing  this  point  further,  let  us  introduce  the 
notion  of  "representations"  of  a  strategy  [cf.  Ba?ar  (1980b)]. 

Definition  3.1.  For  an  M-agent  deterministic  control  (decision)  problem  with 
strategy  spaces  (nm;  m  €  M  }  ,  let  the  strategies  of  all  the  agents,  except  the 
mth  one,  be  fixed  at  tt1-  €  II*,  i  €  M,  i  j*  m.  Then,  a  strategy  rrm  £  n®  for  DMm 

~a  m  f  i  — 

is  a  representation  of  another  strategy  tt  6  II  ,  with  tt  611  (i  6  M,  i  }*  m ) 
fixed,  if 

(i)  the  M-tupies  {tt®,^;  i  €  M,  i  J4  m}  and  i  6  M,  1  j*  m  } 

generate  the  same  unique  state  trajectory,  and 

(ii)  tt®  and  ff®  have  the  same  open-loop  value  on  this  trajectory,  a 
A  salient  feature  of  team-optimal  and  Pareto-optimal  solutions  is 

that  under  a  given  dynamic  deterministic  information  structure,  every  repre¬ 
sentation  of  a  solution  M-tuple  also  constitutes  a  solution  to  the  problem. 
However,  in  the  cases  of  Nash  equilibrium  and  Stackelberg  solutions,  this  pro¬ 
perty  no  longer  holds  true. 

3.3.  Nash  Equilibria 

Derivation  of  Nash  equilibria,  when  M  agents  have  different  cost 
functionals  to  minimize,  involve  the  solution  of  the  set  of  M  inequalities  (2.6), 
which,  depending  on  the  underlying  information  structure,  may  be  quite  a  dif¬ 
ficult  problem,  because  each  inequality  defines  an  optimal  control  problem 
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that  depends  structurally  on  the  other  agents'  strategies.  However,  if  the 
underlying  Information  pattern  Is  open-loop,  the  structure  of  each  of  these 
optimal  control  problems  Is  not  affected  by  the  other  agents'  control  vectors, 
and  hence  derivation  of  Nash  equilibria  in  this  case  becomes  equivalent  to 
solving  (jointly)  M  optimal  control  problems.  This  argument  then  readily 
leads,  by  using  the  minimum  principle,  to  the  following  set  of  first-order 
necessary  conditions  that  yield  the  candidate  open-loop  Nash  equilibrium  so¬ 
lutions  [cf.  Ba?ar  (1979a)]. 


Proposition  3.1.  For  the  multicriteria  decision  problem  described  by  (3.1) 

and  (3.2),  let  fn(xn>V  *  *  ,un>  and  8n(Xn+l’V  *  *  »un'xn)  be  continuously  d*f“ 

—  —  *  *  _ 
ferentiable  in  xq,  and  xn+^»  n  €  N,  m  €  M.  Then,  if  [tt“  (x^)  ■  u  ;  m  €  M  } 

provides  an  open- loop  Nash  equilibrium  solution  and  [x*+^,  n  €  N  }  is  the 

corresponding  state  trajectory,  there  exists  a  finite  sequence  of  costate 

vectors  (p2»**»P™+]i  3  f°r  each  m  €  M  such  that  the  following  relations  are 

satisfied: 


*  1 
fn(Xn’Un>  •  •  »0 

n  n  n  n 


M. 


n+1 


x*  -  x 
X1  X1 


v?  <*,) 


_  m 

=*  u 

n 


-  arg  min 


m  y.  m 

u  fc  U 
n  n 


ITT  .a 

m-1 

Vpn+l’V,,,Un 


.*  * 
m  mfl  M  *. 

’••’VV 


,u  ,u 
*  n’  n 


-*  -1*  .M*v',_m  .  _  m.  *  -1*  M* ' 


1 r  £„<vu;  ••••<  >  14 ♦  •••<*„  •<>» 


+  ®n^xn+l’un  ••••"*  ,Xn)' 


m 

PN+l 


0,  m  6  M,  n  €  N, 


For  further  discussion  on  the  dsrlvs eloa  of  this  s«e  of  first-order  necessary 

r- 

^  conditions,  end  elucidation  of  soma  special  cases  as  regards  the  structures  of 

•  f  and  gm,  we  refer  to  (Baser  and  Olsder  (1982),  chapter  6). 

ti  n 

*>'  Another  tractable  class  of  problems,  as  far  as  derivation  of  Nash 

equilibria  is  concerned,  is  the  class  of  multicriteria  decision  problems  with 

i  •*  * 

closed- loop  no-memory  (feedback)  information  structure.  Since  every  open- loop 
Nash  equilibrium  solution  is  also  a  Nash  equilibrium  solution  under  the  closed 

^  loop  no-memory  information  structure,  the  Nash  equilibrium  solution  to  this 

:  class  of  problems  cannot  be  unique,  and  in  fact  it  exhibits  "informational  non- 

uniqueness"  [see,  Bapar  and  olsder  (1982)].  One  way  of  eliminating  this  infor- 

I 

H  mational  nonuniqueness  under  the  feedback  information  pattern  is  to  require 
the  Nash  equilibrium  solution  to  have  the  additional  property  that  its  restric- 
tlon  to  the  interval  [n,N]  is  a  Nash  solution  to  the  truncated  version  of  the 

P  original  problem,  defined  on  [n,N] ,  and  this  being  so  for  all  n  6  N.  Such  a 
solution  is  known  as  a  feedback  Nash  equilibrium  solution,  which  is  free  of 
any  informational  nonuniqueness,  and  whose  derivation  follows  a  dynamic  pro- 

m  grammlng  type  argument,  as  summarised  in  the  following  proposition. 

*  • 

■  Proposition  3.2.  For  the  multicriteria  decision  problem  described  by  (3.1) 
and  (3.2),  and  under  the  closed-loop  no-memory  (or  closed-loop)  information 
-  pattern,  the  set  of  strategies  CY*(xa)i  °  €  N,  m  6  M}  provides  a  feedback 
Nash  equilibrium  solution  if,  and  only  if,  there  exist  functions  V®(x), 
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n  6  N,  m  €  M,  such  thac  the  following  recursive  relations  are  satisfied: 


um  6  if 
n  n 


V®(x)  -  min  (x,u”),  yJoO , . .  .y®"1  (x),u®, 

n  _  _  n  n  n  n  n  n 


S™c2^  (x>y“  <*))>yJ  (x)>...y“  0O.*l  +  (x,y“  (it))] 


'n  n 


V^(x)  -  0,  m  6  M 


where 


-  * 


f“  (x,u“)  A  f  [x,y^(x),..,Y®“1(x),u®,y“+L(x),..,Y“  (x)J .  □ 

n  n  *  n  a  n  n  a  n 


It  should  be  clear  from  the  above  that  feedback  Nash  equilibrium  solu¬ 
tion  can  be  obtained  recursively,  by  solving  a  set  of  static  Nash  problems  at 
each  stage,  which  is  a  feature  that  makes  it  computationally  attractive.  Yet 
another  important  feature  that  should  be  recorded  is  that  feedback  Nash  solution 
is  indeed  a  Nash  equilibrium  solution  under  the  closed-loop  no-memory  or  closed- 
loop  information  patterns  (satisfying  inequalities  (2.6)),  but  one  of  many  "in¬ 
formationally  nonunique"  equilibria  under  those  dynamic  information  structures. 

As  already  mentioned  above,  when  we  have  the  closed- loop  information 
pattern,  or  any  dynamic  information  pattern  that  exhibits  redundancy  in  informa¬ 
tion,  Nash  equilibria  are  Informationally  nonunique  and  there  exists  in  fact 
an  uncountable  number  of  such  equilibria.  A  set  of  reasons  for  this  is  now 

provided  in  the  following  definition  and  proposition,  where  a  proof  for  the 
latter  can  be  found  «  (Ba$ar  and  Olsder  (1982),  chapter  6). 


Definition  3.2.  Let  A  and  B  be  two  M-person  N-stage  deterministic  multicri¬ 


teria  decision  problems  which  admit  precisely  the  same  extensive  form  descrip¬ 
tion  (as  in  §2.2)  except  the  underlying  information  pattern  (and,  of  course, 
also  the  strategy  spaces  whose  descriptions  depend  on  the  information  pattern). 
Let  T|®  (respectively,  1^)  denote  the  information  pattern  of  DMm  in  problem  A 
(respectively,  B) ,  and  let  the  inclusion  relation  7]“  c  imply  that  whatever 
DMm  knows  at  each  stage  of  A  he  also  knows  at  the  corresponding  stages  of  B, 
but  not  necessarily  vice  versa.  Then,  A  is  informationally  inferior  to  B  if 
71®  c  7|®  V  m  €  M,  with  strict  inclusion  for  at  least  one  m.  Q 

Proposition  3.3.  Let  A  and  B  be  two  deterministic  decision  problems  as  intro¬ 
duced  in  Definition  3.2,  so  that  A  is  informationally  inferior  to  B.  Further¬ 
more,  let  the  strategy  spaces  of  the  decision  makers  in  the  two  problems  be  com¬ 
patible  with  the  given  information  patterns  and  constraints  (if  any)  imposed 
on  the  controls,  so  that  7)®  c  T(®  Implies 

equilibrium  solution  for  A  is  also  a  Nash  equilibrium  solution  for  B,  (ii)  if 
{tt*,  . .  ,ttM}  is  a  Nash  equilibrium  solution  for  B  such  that  tt  €  II®  for  all 
m  €  M,  it  is  also  a  Nash  equilibrium  solution  for  A.  □ 

Hence,  multicriteria  deterministic  decision  problems  with  dynamic 
information  patterns  that  exhibit  redundancy  in  information  are  not  well  de¬ 
fined  under  the  Nash  solution  concept  (since  they  admit  a  plethora  of  informa¬ 
tionally  nonunique  equilibria)  unless  some  additional  selection  criteria  are 
Introduced  — such  as  the  requirements  Imposed  by  the  feedback  Nash  solution 
discussed  earlier.  We  do  not  pursue  this  point  any  further  here,  but  note 
that  one  such  criterion  is  in  fact  provided  in  §4.3  under  a  stochastic  set-up. 


H®  C  llj,  m  €  M.  Then,  (i)  any  Nash 


3.4.  Stackelberg  (Leader-Follower)  Solutions 


In  this  subsection,  we  treat  the  problem  of  optimal  control  and 
coordination  of  deterministic  systems  under  a  hierarchical  decision  structure, 
and  investigate  derivation  of  optimal  control  and  coordination  strategies  by 
employing  the  Stackelberg  solution  concept  introduced  in  §2.3.  As  discussed 
earlier  in  §2.3,  while  introducing  the  Stackelberg  solution  concept,  existence 
of  a  hierarchy  in  decision  making  results  in  an  asyianetry  in  the  roles  of  the 
agents,  with  some  of  them  being  in  a  position  to  dictate  their  strategies  on 
the  others. 

In  general,  derivation  of  Stackelberg  solutions  in  dynamic  decision 
problems  is  quite  challenging,  the  difficulty  being  mostly  of  conceptual  nature. 
However,  for  some  special  information  structures,  the  problem  becomes  tractable 
because  some  standard  methods  and  techniques  of  optimization  and  optimal  con¬ 
trol  theory  become  applicable.  One  such  class  of  problems  is  characterized  by 
open-loop  information  structure,  and  say  two  agents  (i.e.  M  «  2)  for  the  sake 
of  simplicity  in  the  discussion  to  follow.  Since  the  leader's  information 
structure  is  open-loop,  the  optimization  problem  faced  by  the  follower  in  the 
determination  of  his  optimal  response  set  (2.8)  is  structurally  independent  of 
different  choices  of  strategies  by  the  leader,  and  therefore  the  first  phase 
of  the  derivation  of  the  Stackelberg  solution  is  a  feasible  (tractable)  optimal 
control  problem.  In  particular,  if  the  follower's  cost  functional  is  strictly 
convex  in  his  control,  the  rational  response  set  R(tt1‘)  will  be  a  singleton  and 
the  reaction  function  T  [see  (2.9a)]  will  be  determined  completely  by  a  set  of 

necessary  and  sufficient  conditions  which,  under  certain  structural  assumptions 
2  - 

on  f  and  g  ,  n  €  N,  will  lead  to  an  analytical  solution  for  T.  If  such  an 
n  n 
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analytic  solution  can  be  found,  then  the  leader's  optimization  problem 
min  is  again  a  standard  optimal  control  problem  which  can  readily 

be  solved  using  the  available  techniques  for  dynamic  optimization,  and  the 
open-loop  representation  of  this  solution  (in  case  it  is  obtained  as  closed- 
loop  solution)  will  constitute  a  Stackelberg  strategy  for  the  leader.  In  case 
an  analytic  expression  for  T  does  not  exist,  the  necessary  and  sufficient  con¬ 
ditions  that  describe  T  will  have  to  be  treated  as  constraints  in  the  leader's 
optimization  problem  which  again  involves  no  difficulties  of  conceptual  nature. 
A  set  of  equations  from  which  the  solution  of  this  constrained  optimal  control 
problem  can  be  obtained  can  be  found  in  (Ba$ar  and  Olsder  (1982),  chapter  7); 
we  do  not  discuss  this  class  of  problems  any  further  here.  It  is  worth  noting 
here  that  the  preceding  derivation  is  valid  not  only  under  the  open-loop  infor¬ 
mation  structure  for  both  agents,  but  also  when  the  follower  has  access  to 
dynamic  state  information  — the  only  requirement  is  that  the  leader  should  have 
only  open- loop  information.  Furthermore,  one  can  envisage  direct  extensions 
of  this  procedure  to  M-agent  problems  with  one  leader  and  M-l  followers,  with 
the  latter  determining  their  policies  according  to  the  Nash  or  Pareto-optimum 
solution  conept,  and  with  the  leader  having  access  to  only  open-loop  informa¬ 
tion;  there  appears  to  be  no  difficulties  of  conceptual  nature  in  such  an  ex¬ 
tension. 

When  the  leader  has  access  to  dynamic  state  information,  derivation 
of  the  Stackelberg  solution  constitutes  a  challenging  problem,  and  the  stan¬ 
dard  techniques  of  optimization  do  not  apply,  since  the  optimal  control  pro¬ 
blem  characterizing  the  rational  response  sec  R(tt^)  is  now  structurally  de¬ 
pendent  on  the  leader's  choice  or  strategies.  One  way  out  of  this  difficulty 
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‘  ."I 

i 


would  be  to  fix  the  structure  of  the  leader's  possible  strategies  parametri¬ 
cally,  find  the  follower’s  rational  response  as  a  function  of  these  parameters 
and  then  optimize  the  leader's  cost  functional  over  these  parameter  values, 
also  in  view  of  the  follower's  response;  this  definitely  leads  to  suboptimal 
strategies  for  the  leader  — the  degree  of  suboptimality  depending  on  how 
representative  the  fixed  structure  is  in  the  general  class  of  policies. 

Another  way  of  making  the  Stackelberg  problem  tractable  is  to  re¬ 
quire  the  solution  have  a  feedback  property  (under  the  closed-loop  no-memory 
of  closed-loop  information  sharing  pattern),  analogous  to  the  case  of  the  feed¬ 
back  Nash  equilibrium  solution,  which  would  lead  to  a  recursive  derivation  in 
retrograde  time  that  involves  solution  of  static  Stackelberg  problems  at  every 
stage.  The  solution  obtained  through  such  a  recursive  procedure  is  called  a 
feedback  Stackelberg  solution  [cf.  Simaan  and  Cruz  (1973a, b)]  and  satisfies 
the  conditions  given  in  the  following  proposition. 

froeosltlon  3.4.  For  the  two-agent  multicriteria  decision  problem  described  by 

(3.1)  and  (3.2)  with  M  •  2,  and  under  the  closed-loop  no-memory  (or  closed-loop) 

1*  2* 

Information  structure,  the  set  of  strategies  {yn  (xn) >Yn(xn) 5  »  €  N  }  provides 
a  feedback  Stackelberg  solution  with  DM  1  as  leader ,  if 
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m  «  1,2,  a  6  N, 


and  Gn  is  defined  recursively  by 


1*  2* 


m  m 


wW'V^W'Vi1  +  8n :  Vi  * l-2-  0 


The  feedback  Stackelberg  solution  corresponds  to  the  case  when  the 
leader  can  enforce  his  strategy  on  the  follower  only  stagewise;  however,  if 
he  has  the  power  and  ability  to  declare  and  enforce  his  strategy  several  stages 
in  advance  throughout  the  decision  process,  or  from  the  very  beginning  for  the 
entire  duration  of  the  decision  process,  the  cost  that  the  leader  incurs  will 
definitely  be  less  (or  at  least  not  higher)  than  his  optimal  cost  under  the 
feedback  Stackelberg  solution.  In  other  words,  in  contrast  to  the  feature  re¬ 
corded  after  Proposition  3.2  in  the  case  of  feedback  Nash  solution,  the  feed¬ 
back  Stackelberg  solution  is  not  necessarily  a  Stackelberg  solution,  i.e.  it 
need  not  satisfy  (2.9b);  conversely,  a  Stackelberg  solution  obtained  under  the 
closed-loop  no-memory  information  structure  is  not  necessarily  a  feedback 
Stackelberg  solution.  On  the  other  hand,  derivation  of  a  Stackelberg  solution 
under  dynamic  state  equation  is  a  relatively  much  more  difficult  problem,  for 
which  the  standard  techniques  of  optimization  cannot  be  used. 

Another  case  treated  in  the  literature  recently  is  the  closed-loop 


26 


no-memory  information  structure  where  the  leader's  strategy  is  a  function  of 
the  current  state.  This  problem  leads  to  a  nonclassical  control  problem 
where  the  partial  derivative  of  the  leader's  strategy  with  respect  to  the 
state  appears.  It  is  shown  in  Fapavassilopoulos  and  Cruz  (1979a)  that  the 
optimal  values  of  the  state,  controls,  and  objective  functions  are  not  changed 
by  using  controls  which  are  more  general  than  affine  functions  of  the  state. 
When  the  measurement  is  a  function  of  the  state  (possibly  nonlinear)  the 
strategy  may  be  assumed  to  be  affine  in  the  measurement  without  loss  of  gen¬ 
erality. 

Quite  recently,  an  indirect  approach  has  been  developed  towards  the 
solution  of  such  nonclassical  optimization-decision  problems  when  the  leader 
has  access  to  redundant  information  (such  as  the  closed  loop  state  informa¬ 
tion)  .  In  the  sequel  we  discuss  some  aspects  of  this  new  approach  and  deriva¬ 
tion  of  the  dynamic  Stackelberg  solution. 

Now,  for  the  general  two-agent  decision  problem  of  this  subsection, 
and  with  the  leader  having  access  to  closed-loop  state  information,  consider 
the  following  sequence  of  optimization  problems. 


t 


STEP  1.  For  a  fixed  set  of  state  vectors  [x^,  n  €  N,  n  +  1}  say  [xq  ■  x^, 
n  €  N,  n  ^  1 } ,  and  leader's  control  vectors  [u^,  n  €  N},  minimize 

N-l 

S(Vrt’VVV  +  2  8n(xn+l,un’un’Xn)  +  8o(x2 ,U1,U1,X1)  (3*7) 

n«4 


,:1 


.'-I 


■f  1 
*  1 


*  1 

.4 


"1 


(3.8) 
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Denote  the  solution  of  this  problem  by 


2  1  ls  cz 

un  *  zn(x2,..)xN;u1,..,uN),  n  €  N. 


(3.9) 


STEP  2:  Now  consider  minimization  of  the  function 


N-l 


S'Vi'VVV  +  E,  +  *o(i2-ui’ui'xt)  <3-l0> 

n«z 


over  the  leader's  controls  {u^  €  u\  n  €  N},  and  the  state  values  fx  ,  n  €  N, 

'■  n  n  un 


1 

n  l},  subject  to  (3.8)  and  (3.9).  Denote  the  minimizing  solution  by  {u  , 


n  €  N}  and  {x  ,  n  €  N,  n  j*  1}  and  the  corresponding  value  of  expression  (3.10) 
1* 

by  J  . 


1 

The  quantity  ,  thus  obtained,  provides,  under  a  fairly  general 

1  1*  1* 

set  of  conditions,  a  tight  lower  bound  on  the  Stackelberg  cost  J  (tt  ,Trr  ) 


of  the  leader  (as  defined  by  2.9b)).  These  conditions  basically  involve  exis- 

*  4 

.1  .  ~1 


tence  of  a  strategy  tt1-  6  II  ,  for  the  leader,  which  is 


(i)  a  closed-loop  representation  of  the  open-loop  policy  {u*  ,n  €  N} 


*  —  1  * 

on  the  trajectory  {xr  ■  xr,  n  €  n},  where  ur  and  x^  are  as  defined  above. 


with  x£  •  x^; 


2  ★ 

(ii)  forces  the  minimum  value  of  (3.7)  to  be  attained  at  fu  ■  z  (x. 
*  .*  n  n  2 


*11  - 

.  .  .x^u^  ,  ..u^  ),  n  €  N},  with  the  minimization  problem  defined  by  replacing 

1  i* 

u  i“  (3.7)  and  (3.8)  by  y1  (.),  n  €  N,  and  x  ,  n  €  N,  in  (3.8)  by  x  ,  and 


n  -  '  *  n  -  »  n'  -  '  '  n* 

retaining  this  new  form  of  (3.8)  as  a  constraint.  Note  that  this  latter  re¬ 


quirement  is  equivalent  to  the  statement  that  the  follower's  rational  response 


to  the  leader's  announced  strategy  rr^  should  lead  to  the  trajectory  fx*, 

*  n* 

n  €  N}  and  have  the  open- loop  representation  {u*  »  zq(x*, . . ,xj;u,  ,  ..,u,j  )> 
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n  6  n}. 

Several  recent  papers  have  investigated,  in  special  contexts,  sat¬ 
isfiability  of  these  two  conditions,  and  derivation  of  corresponding  strategies 

i* 

(tt1  )  for  the  leader.  Ba^ar  and  Selbuz  (1979a, b)  have  shown  that  when  the 

system  equation  is  linear  and  cost  functionals  are  quadratic,  there  are  cases 

1*  1 
when  J  coincides  with  the  global  minimum  value  of  J  (in  particular,  if  the 

follower  does  not  act  in  the  last  stage  of  the  game)  and  a  corresponding  Stackel- 
berg  strategy  for  the  leader  is  of  the  linear,  one-step  memory  type.  Tolwinski 
(1981)  has  shown  that  for  the  same  class  of  problems,  use  of  nonlinear  strat¬ 
egies  by  the  leader  extends  the  parameter  region  for  which  the  preceding  pro¬ 
perties  of  the  solution  hold  true.  Papavassilopoulos  and  Cruz  (1980),  Ba?ar 
and  Olsder  (1980)  and  Ba$ar  (1981d)  have  investigated  counterparts  of  these 
results  and  their  extensions  in  the  continuous  time.  Ho,  Luh  and  Muralidharan 
(1980),  Ho,  Luh  and  Olsder  (1980),  and  Salman  and  Cruz  (1981)  have  drawn 
parallels  between  these  results  and  incentive  scheme  design  problems  in  eco¬ 
nomics  and  have  discussed  applications  of  these  concepts  to  microeconomics  and 
social  choice  theory.  Ba^ar  (1981e)  and  Tolwinski  (1980)  have  discussed  pos¬ 
sible  extensions  to  multi-agent  cases  when  there  exist  more  than  two  levels  of 
hierarchy  and  several  agents  at  every  level  of  decision  making. Ba^ar  and  Selbuz 
(1979b)  show  that  if  there  exist  two  levels  of  hierarchy  and  more  than  one 
agent  in  the  follower's  group,  the  leader  can  still  retain  his  powerful  posi¬ 
tion  by  announcing  an  appropriate  linear  one-step  memory  strategy  (for  linear- 
quadratic  problems)  that  would  force  the  followers  (who  are  making  their  deci¬ 
sions  noncooperative ly  and  under  the  Nash  solution  concept)  to  minimize  glob¬ 
ally  the  leader's  cost  function.  Ba?ar  (1980b)  has  further  discussed  coordina¬ 
tion  aspects  of  such  problems  and  has  investigated  the  possibilities  for  the 
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leader  to  coordinate  the  followers  in  such  a  way  that  the  resulting  solution 
will  be  Pareto -optimum,  even  though  the  followers  may  be  acting  noncoopera¬ 
tive  ly. 

It  is  possible  to  extend  the  two-step  derivation  of  the  closed-loop 
Stackelberg  solution,  outlined  earlier  and  defined  through  the  optimization 
problems  (3.7)  -  (3.10),  to  the  case  when  the  leader's  information  is  partial 
closed-loop  [see  (3.5)].  In  this  case  the  two  optimization  problems  at  Steps 
1  and  2  will  be  replaced,  respectively,  by  the  following: 

STEP  1*:  Let  the  observation  vector  y\  defined  by  (3.5b).  belong  to  the 
~ — —  n 

space  Y*.  For  a  fixed  set  of  observation  vectors  [y^  €  Y*,  n  €  N,  n  +  l},say 
[y*  ■  y^,  n  €  N,  n  f  l},  and  leader's  control  vectors  (u*,  n  6  N},  minimize 


N 

Z 

n-0 


8n(xn+l’ui,un,xn) 


(3.11) 


2  2  — 

over  u  6  U  ,  n  €  N,  and  subject  to  the  constraints 
n  n 


n+1 


c  /  1  2, 

fn(x,u,u) 

n  n  n  n 


(3.12a) 


hn(Xn> 

n  n 


y  ,  n  €  N,  n  M. 
n 


(3.12b) 


Denote  the  solution  of  this  optimization  problem  by 


Un  •  I„(V  --VUl-"',|*l) 


(3.13) 


STEP  2  :  Now  minimize  the  function 

®  1  12 
n*i 


(3.14) 
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over  Che  leader's  controls  fu^  €  u\  n  €  N},  and  Che  measuremenc  values 

^  n  n 

[y*  €  Y^,  n  6N,  n  |l  l}i  subject  to  (3.12a),  (3. 13)  and 
yn  "  hn(V’  n  n  1. 

1* 

Denote  the  minimizing  solution  by  (u*,  n  6  N}  and  {y*,  n  €  N,  n  +  l},  the 

resulting  state  trajectory  by  (x*,  n  €  N}  and  the  corresponding  values  of 

1* 

expression  (3.14)  by  J 

1* 

The  conditions  for  J  to  provide  a  tight  lower  bound  on  the  Stackel- 

1**  *  1 
berg  cost  J  (n1  ,Trr-  )  involve,  in  this  case,  existence  of  a  strategy  tt*  6  II1 

[n1  is  defined  here  as  the  class  of  all  mappings  compatible  with  the  informa¬ 
tion  structure  given  by  (3.5a)]  that  satisfy  condition  (i)  in  the  perfect 
information  case  and,  in  addition 

(iix)  forces  the  minimum  value  of  (3.11)  to  be  attained  at  [u^  ■  zn^y2’ 
*1*1*  _ 

. . ,yN;  u*  .••»UN  )>  n  €  N  }with  the  minimization  problem  defined  by  replacing 

1  1*  —  -  1 
u1  in  (3.11)  and  (3.12a)  by  y  (.),  n  6  N,  by  replacing  y  in  (3.12b)  by  y  , 
n  n  n  n 

and  by  retaining  these  new  forms  of  (3. 12a) -(3. 12b)  as  constraints. 

For  further  details  on  the  satisfiability  of  these  two  conditions 
and  derivation  of  dynamic  Stackelberg  solution  under  partial  state  informa¬ 
tion,  we  refer  to  Bapar  (1980c)  and  Zheng  and  Bapar  (1981);  the  latter  re¬ 
ference  also  investigates  existence  and  derivation  of  affine  Stackelberg 
strategies  in  such  problems. 
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4.  COORDINATION  AND  CONTROL  IN  STOCHASTIC  SYSTEMS 

In  this  section  we  discuss  coordination  and  control  problems  in  the 
context  of  stochastic  systems  and  under  both  deterministic  and  stochastic  in¬ 
formation  patterns.  We  first  delineate  (in  $4.1)  several  different  informa¬ 
tion  structures  that  we  shall  encounter  in  our  analysis,  and  then  discuss  (in 
$  4.2)  derivation  and  properties  of  optimal  solutions  in  stochastic  team  pro¬ 
blems.  Subsequently  in  §4.3  we  discuss  Nash  equilibria  and  in  §4.4  the  Stackel- 
berg  solution,  for  stochastic  systems  and  under  different  information  patterns. 

4.1.  Information  Structures  in  Stochastic  Systems 

In  stochastic  systems  we  encounter  two  general  classes  of  information 
patterns,  viz.  deterministic  and  stochastic  patterns: 

a)  Deterministic  information  structures 

We  have  discussed  these  thoroughly  in  §4.1  in  the  context  of  deter¬ 
ministic  systems.  The  same  patterns,  namely,  closed-loop  perfect  state,  feed¬ 
back,  one-step  (k-step)  delay  perfect  state,  and  partial  closed-loop  informa¬ 
tion  structures,  are  appropriate  also  in  stochastic  systems,  whenever  the 
agents  have  access  to  the  value  of  the  initial  state  and  to  some  deterministic 
information  on  the  current  and/or  past  values  of  the  state. 

b)  Stochastic  Information  structures 

Assume  that  each  agent  has  access  to  noisy  measurement  on  the  current 
value  of  the  state  through  a  measurement  equation  of  the  type  (2.2),  and  that 
agents  are  also  in  a  position  to  exchange  some  of  their  information  (with  or 
without  delay).  In  such  a  case  we  have  basically  three  general  types  of  in¬ 
formation  structures  as  described  below: 
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i)  Centralized  information  pattern:  All  agents  exchange  their 
measurements  without  any  delay,  and  also  recall  their  past  information,  i.e. 

"H*  *  yrt  i»  •  •  »  y,}.  ®  ^  M,  n  €  N  (4.1) 

tv  u  tv- 1  j. 

where 

yk  i  (yk,  yk.  •  •  .  y").  *  €■• 

This  is  also  known  as  a  classical  information  pattern,  and  it  could  also  in¬ 
volve  the  past  control  laws,  i.e. 

11™  ■  ^n’^n-1’  '  *  ,yl*  Un  l,Un-2’  *  *  €  M,  n  €  N  (4.2) 

where 

.  ,  1  2  M. 

“k  i  <v  v  •  •  ■  V- 

The  two  information  structures  (4.1)  and  (4.2)  are  not  equivalent  (even  though 
they  generate  the  same  sigma-field  for  each  fixed  set  of  control  laws),  but 
only  in  team  problems  may  they  be  used  interchangeably  without  affecting  the 
minimum  value  of  the  comaon  objective  functional  — a  point  which  will  be 
further  discussed  in  §4.2. 

ii)  Quasi-classical  information  patterns;  In  this  group  we  have 
the  "one- step  delay  observation  (measurement)  sharing  pattern",  in  which  case 

^n  "  ^n’  yn-l’  •  '  »  yl^’  ®  ^  M,  n  6  N,  (4.3a) 


and  the  "one-step  delay  information  sharing  pattern"  with 


T|r  *  ^n’^n  1*  ‘  '  ’^1*  Un-l,un-2’  '  *  ,ul^‘  ®  €  M,  a  €  N.  (4.3b) 


In  the  former  case  all  measuremenCs  are  shared  with  a  delay  of  one  stage, 
while  in  Che  latter  case  also  the  past  control  values  are  shared.  Our  earlier 
comments  regarding  the  equivalence  of  (4.1)  and  (4.2)  are  equally  valid  here 
in  the  context  of  (4.3a)  and  (4.3b);  more  discussion  on  this  issue  will  be 
included  in  §  4.2. 

A  more  general  type  of  a  quasi-classical  information  structure  is 
the  so-called  partially  nested  information  structure  which  we  introduce  next. 
Towards  this  end,  assume  that  the  joint  probability  distribution  of  the  random 
variables  associated  with  the  stochastic  system  (2.1)  and  the  measurement 
system  (2.2)  is  independent  of  the  values  of  the  state  and  the  controls.  Then, 
by  iterative  substitution,  (2.1)  can  be  written  as 

Vi  *  WvV 

"  £n[fn-l<Xn-l'un-l'en-l),“n,9nI  i  ^  VA’Vr'i’U 


f  (x,;u  ,u  .,..,u,;9  ,9  ,,..,9.), 

n  i  n  n-x  x  n  n-x  x 


(4.4a) 


and  thus  the  state  at  any  stage  can  be  expressed  solely  in  terms  of  the  past 
controls,  the  past  noise  vectors  and  the  initial  state.  In  terms  of  this 
notation,  the  measurement  equation  (2.2)  can  be  written  as 


m 


m..l 
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that  is,  in  this  new  form  it  depends  only  on  the  "primitive"  random  variables 

and  the  control  vectors.  Now,  we  call  an  information  structure  {nIIt{y^, . .  ,y^; 

n-  n  n 

y^_l» . .  .y^-i*  •  •  • »  yj»  •  •  ,y^‘»  U  . .  ,u^},  n  €  N,  m  e  H  partially  nested  if 
whenever  n®  depends  on  u£  for  some  k  n  and  i  £  M  [either  directly  or  through 
a  measurement  equation  in  the  form  (4.4)],  the  inclusion  relation  n*  ^ 

•j* 

holds  — this  being  so  for  every  such  dependence.  In  other  words,  if  an  infor¬ 
mation  structure  is  partially  nested,  an  agent's  information  at  a  particular 

stage  n  can  depend  on  the  control  of  some  other  agent  at  some  stage  k  _<  n 

only  if  he  also  has  access  to  the  information  available  to  that  agent  at  that 

stage  k. 

The  one-step-delay  observation  sharing  pattern  and  the  one-step 
delay  information  sharing  pattern  introduced  earlier  are  special  types  of 
partially  nested  information  patterns.  The  reason  why  we  are  interested  in 
partially  nested  information  patterns  is  that  stochastic  optimization  and  in 
particular  team  problems  with  such  information  patterns  are  considerably  more 
tractable  than  those  with  nonclassical  information  patterns  — this  latter  con¬ 
cept  to  be  defined  in  the  sequel. 

iii)  Nonclassical  information  patterns.  An  information  pattern  is 
said  to  be  nonclassical  if  it  is  not  partially  nested.  Equivalently,  if  {n™, 
n  €  N,  m  e  M}  is  nonclassical,  there  exists  some  set  of  Indices  {n,  ke  N, 

1  nk* 


m,  i  e  M,  n^k)  such  that  n°  depends  on  u£  but 


^This  inclusion  relation  can  be  replaced  by  the  somewhat  more  general  require¬ 
ment  that  "the  elements  of  p*  can  be  recovered  by  measurable  transformations 
on  the  elements  of  p®". 
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4.2.  Solutions  of  Stochastic  Team  Problems 

Under  the  centralized  stochastic  or  deterministic  information  pat¬ 
terns,  stochastic  team  problems  become  equivalent  to  stochastic  control  pro¬ 
blems,  and  therefore  the  solution  techniques  developed  for  this  latter  class 
of  problems  [see  e.g.  Bertsekas  (1976)]  are  directly  applicable  to  team  pro¬ 
blems.  In  this  context,  it  is  immaterial  whether  the  agents  also  have  access 
to  values  of  past  controls,  since  there  is  a  single  goal  and  a  single  objec¬ 
tive  functional  to  minimize.  In  other  words,  the  minimum  value  of  a  team 
cost  functional  J  will  be  the  same  regardless  of  whether  it  is  computed  under 
(4.1)  or  (4.2);  in  that  sense  we  call  the  two  information  structures  equiva¬ 
lent  as  far  as  the  optimal  team  solution  is  concerned.  However,  this 
feature  is  no  longer  valid  in  multi-criteria  problems  (under  Nash  or  Stackel- 
berg  solution  concepts) . 

If  the  underlying  information  structure  is  not  centralized,  the 
derivation  of  the  optimal  team  solution  is  in  general  quite  intractable.  For 
some  special  types  of  stochastic  team  problems  and  under  the  partially  nested 
information  pattern,  however,  the  derivation  becomes  tractable  by  conversion 
into  an  equivalent  static  formulation.  Before  discussing  this  conversion,  we 
first  state  a  related  result  [Proposition  4.1]  on  an  important  property  of 
partially  nested  information  patterns  in  stochastic  team  problems : 


Let  {n 


m 


M 


M 


M 


~  ^yn* '  ’  ,yn’  ^n-1  *  ’  *  *^n-l  *  *  *  *  *  un-l,",,ul^  * 


n  €  N,  m  €  M}  be  a  partially  nested  information  pattern,  with  the  correspond¬ 
ing  strategy  spaces  denoted  by  {if1,  m  €  M)  and  the  corresponding  sub-strategy 


spaces  by  (T  ,  n  €  N,  m  €  M).  Let  n  denote,  for  each  n  €  N,  m  €  M  the  inter- 
n  n 


section  of  the  finite  sets  nn  and  {y^,..,yn;  yn-l*  ’ '  ,yn-l* '  *  *  ’  • 

Note  that  {n®,  n  £  N,  m  £  M}  is  also  a  partially  nested  information  struc¬ 
ture,  which  does  not  involve  any  explicit  dependence  on  past  control  vectors 
[whereas  n®  may  explicitly  depend  on  controls].  Denote  the  corresponding 

strategy  spaces  by  {tj®,  m  £  M  }  and  the  sub-strategy  spaces  by  {r®,  n  6  N, 

—  12 
m  £  M).  Consider  a  stochastic  team  problem  with  cost  functional  JCn  ,ir  , 


M  m 

rr  )  to  be  jointly  minimized  [over  X.  IT]  by  all  agents.  Then,  we  have  the 

m*l 

following  important  result. 

Proposition  4.1. 

(i)  To  every  fixed  M- tuple  (y®,  y®,  • .  ,y®)  A  it®  £  n®,  m  £  M, 

there  corresponds  a  unique  set  of  strategies  (it®  m  (y®,y®,  •  •  *y^)  »  m  £  M} 

such  that  the  sigma-field  generated  by  n®  with  u®  *  y®(n®)  ,  n  £  N,  m  £  M,  is 

n  n  n  n 

equivalent  to  the  sigma-field  generated  by  n®  with  u®  -  y®(n®) ,  n  £  N,  m  £  M. 
-  n  M  n  n  n 

(ii)  J  admits  a  global  minimum  over  X  if1  if  and  only  if  it  admits 

M  ®=i 

a  global  minimum  over  X  if1  and  the  minimum  values  of  J  in  both  cases  are 

m*i 

the  same .  □ 


This  proposition  is  a  consequence  of  the  ob  ervation  that,  under 
the  partially  nested  information  structure,  any  direct  information  concerning 
the  value  of  control  is  redundant  since  it  can  be  recovered  from  the  measure¬ 
ment  Information  once  the  control  law  is  known.  Consequently,  additional  in¬ 
formation  concerning  the  values  of  past  controls  [provided  that  we  still  have 
a  partially  nested  information  structure]  does  not  help  to  improve  upon  the 
globally  optimal  team  solution.  An  implication  of  this  property  is  that. 


given  a  specific  partially  nested  information  pattern  for  a  stochastic  team 
problem,  we  can  construct  an  equivalent  (larger  or  smaller)  partially  nested 
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information  structure  that  is  equivalent  to  it,  and  reconsider  the  original 
team  problem  under  this  new  information  structure  without  affecting  the  opti¬ 
mal  value  of  the  objective  functional.  What  we  gain  in  return  for  such  a 
conversion  is  a  possible  simplification  in  the  derivation  of  the  optimal  team 
solution.  The  team  solution  obtained  under  the  new  Information  structure  can 
then  be  expressed  in  terms  of  the  original  Information  structure.  Examples 
of  such  an  indirect  derivation  of  optimal  solutions  in  stochastic  team  pro¬ 
blems  can  be  found  in  Ho  and  Chu  (1972)  and  Bagchi  and  Ba^ar  (1980) ,  and 
they  are  primarily  linear  quadratic  problems.  A  CAVEAT  for  the  reader,  at 
this  point,  is  that  neither  Proposition  4.1  nor  any  of  these  conversion  tech¬ 
niques  have  counterparts  in  multi-criteria  problems  (under  Nash  or  Stackelberg 
solution  concept) . 

Let  us  now  consider  one  special  class  of  stochastic  team  problems 
in  some  detail.  Assume  that  the  information  structure  is  partially  nested, 
and  that  the  measurement  equations  (4.4b)  are  separable  in  the  control  vari¬ 
ables,  i.e.  (4.4b)  can  be  written  as 


H  (x.  |  9  •  >8,  5  9_)  ^  G  (u  ,  • .  »u. ),  n  €  N,  mS  M. 

nin  in  nn  l 


(4.5) 


Here,  the  function  U  depends  on  the  control  vectors  in  a  way  that  is  consls- 

u 

tent  with  the  underlying  partially  nested  information  structure  (n”;  n  €  N, 
m  €  M};  i.e.  is  a  function  of  u£  only  if  n™  includes  Now,  if  { tv™  e  Hm, 

m  €  M jdenotes  an  optimal  team  solution  for  a  stochastic  team  problem  with  such 
a  partially  nested  information  structure,  and  with  a  cost  functional  J,  where 

J(ir)  ■  E{L(£,u^, . .  ,uM)  |  um  ■  ff°(nm) ,  m  e  M 


9 
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and  g  denotes  the  collection  of  all  primitive  random  variables,  we  have  (from 
the  definition  of  team  optimality) 

J (tt*)  £  J (tt )  ,  V  TTm  s  nm,  m  S  M, 

which  implies  the  NM-tuple  of  inequalities 


T,  1*  m-1*  m*  m*  m  m*  m*  m+1*  m\ 

J(it*)  <  J(ir  ;...;r  .y  . ..  y  ,y  .Y  , . .  ,y  ;ir  .  ,;ir  ), 

—  *  1  *  *  n-1  n  n+1  N 


m  m  —  — 

7  Yn  e  p  ;  n  6  N,  m  e  M. 


This  set  of  inequalities  (also  known  as  person-by-person  optimality,  if  we 

view  each  um  to  be  controlled  by  a  different  agent)  therefore  provides  a 
n 

necessary  condition  for  u*  to  be  a  team-optimal  solution.  Note  that  here, 
all  sub-strategies  are  held  at  their  optimal  values  and  the  resulting  cost 
functional  is  minimized  over  possible  strategies  y™  e  r™;  hence  each  minimi¬ 
zation  problem  is  basically  of  the  form 


E  {  min  /  L“(? ,u“)dP(£ | n“) } 

m 

u 


(4.6) 


where 


l™(£,u“)  ^  l(5,it1  Cn1) ; -  -  ;y™  (n°)  »•  .  ,y^;  •  • 


and  p™(£|n”)  is  the  conditional  probability  distribution  of  the  primitive 
random  variables  g  given  the  information  vector  p™.  This  conditional  pro¬ 
bability  distribution  is  also  known  as  the  sufficient  statistics  for  DM  m 
at  stage  n.  E{.}  denotes  the  expected  value  over  the  statistics  of  n°>  after 
um  m  ym(r)m)  is  determined.  The  reason  why  Lm  can  be  determined  explicitly 


as  a  function  of  (E ,u  )  is  because  the  information  structure  is  causal,  and 

n 

hence  elimination  of  other  variables  by  iterative  substitution  is  possible. 

Whenever  is  partially  nested  and  the  measurements  that  appear 

in  Tf1  satisfy  the  separability  condition  (4.5),  the  sufficient  statistics 
n 

have  a  simpler  form  which  is  basically  static  in  nature.  To  see  this, 

firstly  construct  (in  view  of  Proposition  4.1)  the  largest  partially  nested 

information  structure  (say,  if1)  that  is  equivalent  to  if*.  This  new  informa- 

n  n 

tion  structure  Tl®  clearly  has  the  property  that  whenever  TL*  C  7jm  for  any 
n  k  —  n 

k  _<  n,  i  €  M,  we  have  u^  €  TJ^.  Because  of  separability  of  (4.5)  and  the 

partially  nested  property  of  if1,  we  have  the  further  (sigma-field)equiva- 
'  '  n 


lence 


*  -  S 


where  Tf”  is  obtained  from  7 f1  by  replacing  all  y.1  with 
n  n  k 


"*k  “  \(xl;  0n’",0l; 


Therefore, 


p(«|H“>  -  p(e|  ti“)  -  p(5ln“) . 


But,  since ^|m  is  also  partially  nested,  the  presence  of  the  control  values  in 
1  n 

n™  does  not  provide  any  additional  information,  and  we  may  as  well  consider 


the  smaller  set 
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which  is  totally  static.  Hence, 

po-fO  -  pcrO’ 

which  implies  that  there  exists  an  equivalent  static  sufficient  statistics 
for  CM  m  at  stage  n.  This  leads  to  the  following  important  conclusion. 
Proposition  4.2. 

(i)  Any  stochastic  team  problem  with  dynamic  partially  nested 
information  structure  {T|®,  n  6  N,  m  €  M  }  ,  whose  measurement  equations  also 
satisfy  the  property  (4.5),  is  equivalent  to  one  with  a  static  information 
structure  { T^;  n  €  N,  m  €  M  }  as  constructed  above,  in  the  sense  that  the 
optimal  solution  of  one  can  be  obtained  from  the  optimal  solution  of  the  other. 

(ii)  If  (u°  ■  y°  (f£)  »  n  €  N,  me  M  }  denotes  the  optimal  team 

solution  under  the  equivalent  static  information  structure,  the  solution  of 

_  _ 

the  original  team  problem  can  be  expressed  as  {ur  ■  (T^  );  n  e  N,  m  €  m} 

where  is  obtained  from  by  replacing  y£  with 


i  ~i ,  1  M . 
Y-i.  -  G,  (u  ,..,u  ) 


and  by  appropriately  replacing  some  of  the  controls  with  their  optimum  values, 
in  a  way  compatible  with  the  underlying  information  structure,  [if  the  orig¬ 
inal  Information  structure  ®  is  the  largest  partially  nested  information 
structure  that  is  equivalent  to  itself,  this  latter  phase  is  not  required].  □ 
Remark.  The  separability  condition  (4.5)  of  the  Proposition  can  be  relaxed 


to  some  extent.  The  real  requirement  here  is  that  the  conditional  probability 


P (C |  should  be  Independent  of  the  control  laws,  so  that  there  can  be  found 
a  static  information  structure  with  the  property  P(5|  °)  •  P(S|  *  •  A 

more  relaxed  condition  [than  (4.5)]  that  achieves  this  is  given  in  Ho  and 
Chu  (1973).  □ 

The  result  of  Proposition  4.2  is  very  useful  in  stochastic  team  pro¬ 
blems,  because  derivation  of  the  optimal  team  solution  under  static  informa¬ 
tion  is  in  general  much  simpler  than  the  derivation  under  dynamic  Information. 
In  particular,  for  the  special  case  when  (i)  the  measurement  equations  are 
linear  in  the  primitive  random  variables  and  the  controls,  (ii)  the  primitive 
random  variables  are  jointly  Gaussian  distributed,  (ill)  the  cost  functional 
L  is  quadratic  in  the  control  vectors  and  the  primitive  random  variables,  and 
(iv)  L  is  further  strictly  convex  in  the  control  variables,  the  unique  team 
optimal  solution  is  affine  in  the  available  Information  and  can  readily  be 
computed  by  solving  the  set  of  minimization  problems  (4.6)  [see  Radner  (1962), 
Ho  and  Chu  (1972)].  Therefore,  every  linear-quadratic-Gausslan  stochastic 
team  problem  with  strictly  convex  cost  functional  and  partially  nested  infor¬ 
mation  structure  admits  a  team-optimal  solution  that  is  affine  in  the  avail¬ 
able  Information  —a  result  which  directly  follows  from  Radner* s  above  men¬ 
tioned  result  in  view  of  Proposition  4.2.  Furthermore,  team-optimal  control 
laws  can  be  obtained  recursively  when  the  partially  nested  information  pat¬ 
tern  is  one-step  delay  information  sharing  [Kurtaran  (1975) ,  Sandell  and 
Athans  (1974),  Yoshikawa  (1975)]  or  one-step-delay  observation  sharing  [Bafar 
(1978a)],  The  solution  is  unique  in  the  latter  case  and  nonunique  in  the 
former  case  —the  nonuniqueness  arises  because  the  one-step  delay  information 
sharing  pattern  Includes  redundant  Information  which  gives  rise  to  several 
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different  "representations"  [see,  Ba^ar  (1978a)}. 

If  the  underlying  information  structure  in  a  stochastic  team  problem 
is  neoclassical,  derivation  of  the  optimal  team  solution  meets  with  formidable 
difficulties.  Even  in  the  simplest  type  of  a  linear-quadratic-Gaussian  team 
problem  with  a  two-step  delay  information  sharing  pattern  (i.e.  a  nonclass leal 
information  pattern)  the  optimal  solution  is  nonlinear  and  cannot  be  obtained 
analytically;  moreover  even  a  numerical  derivation  is  a  challenging  task  be¬ 
cause  such  problems  admit  several  person-by-person  optimal  solutions  and  local 
optima  [see,  Witsenhausen  (1968)].  There  are  also  no  simple  sufficient  sta¬ 
tistics  for  such  problems  with  nonclasslcal  information  patterns  [see,Yoshlkawa 
and  Kobayashi  (1978) ,  and  Varalya  and  Walrand  (1978) ] .  These  difficulties  are 
due  to  the  fact  that  each  control  has  in  general  a  "triple"  role  in  stochastic 
team  problems  [Ho  (1980) ] :  (1)  the  deterministic  control  effort  of  reducing 

the  error,  (ii)  to  Improve  the  future  knowledge  of  uncertainty,  (iii)  to 
"signal"  the  agents  acting  in  the  future  some  useful  information  which  they 
will  not  necessarily  acquire  [in  the  case  of  classical  or  quasi-classlcal  in¬ 
formation  patterns,  this  third  role  is  absent];  and  these  three  roles  are  in 
general  in  conflict  with  each  other.  Only  if  these  roles  are  Isolated,  the 
stochastic  team  problems  with  nonclasslcal  information  patterns  tend  to  be 

comparatively  tractable  [see,  Witsenhausen  (1975),  and  Ho,  Kastner  and  Wong 
(1978)]  — but  this  is  indeed  a  very  special  class  of  problems  and  the  more 
general  nonclasslcal  stochastic  team  problems  await  innovative  ideas,  tech¬ 
niques  and  approaches. 

4.3  Nash  Equilibria 

Derivation  of  Nash  equilibria  for  stochastic  systems  controllid  by 
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several  agents  with  different  objective  functionals  is,  in  general,  an  ex¬ 
tremely  challenging  problem  when  the  information  pattern  is  nonclassical 
--Che  reasons  being  similar  to  these  we  have  discussed  above  ac  some  length 
in  the  context  of  team  problems.  Therefore,  we  will  confine  our  discussion 
in  the  sequel  to  deterministic,  and  stochastic  classical  and  quasi-class leal 
information  patterns. 

We  have  seen  in  §3.3  that  in  the  case  of  deterministic  systems  with 
deterministic  dynamic  information  patterns,  there  exists,  in  general,  a  multi¬ 
tude  of  Nash  equilibria  —the  reason  being  that  in  such  problems  (1)  every  con¬ 
trol  law  has  several  different  "representations"  and  (11)  every  Nash  equilibrium 
obtained  under  an  information  structure  that  is  inferior  to  the  original  deter¬ 
ministic  information  structure  constitutes  a  Nash  solution  also  under  the 
original  information  structure.  We  call  such  equilibria  "Informationally  non- 
unique"  Nash  solutions.  For  stochastic  systems  of  the  type  (2.1),  however,  in¬ 
formationally  nonunique  Nash  equilibria  cannot  occur,  even  under  deterministic 
dynamic  state  Information,  provided  that  (roughly  speaking)  the  noise  vector 
"Influences"  all  points  in  the  state  space  X,  and  for  every  n  €  N  [Ba?ar 
(1976,  1979a)].  A  more  precise  statement  can  be  given  for  the  case  when  S^bas 


an  additive  effect,  that  is  when  (2.1)  is  written  as 


-0+1  ■  fn(VV  + 


(5.7) 


The  requirement  here  is  that  the  probability  measure  associated  with  9q 
should  assign  positive  probability  to  every  open  subset  of  X  [assuming  that 
an  appropriate  topology  is  defined  on  X]  [Van  Damne(1980)] .  Such  a  stochastic 
formulation  ensures  existence  of  a  unique  representation  for  every  strategy  and 
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hence  eliminates  the  possibility  of  having  informationally  nonunique  Nash 
equilibria  under  dynamic  state  information  (such  as  the  closed-loop  perfect 
state  information) .  The  only  nonuniqueness  (if  any)  will  be  due  to  the  struc¬ 
tures  of  the  cost  functions  and  the  state  equation. 

Consider  now  the  case  when  the  system  equation  is  given  by  (4.7) [with 
the  probability  measure  of  9^  having  the  property  discussed  above] ,  the  under¬ 
lying  Information  structure  is  closed-loop  perfect  state,  and  the  cost  func¬ 
tional  of  DM  m  (m  €  M)  is  given  by 
N 

J®  *  E{  L  gm(x  -  ,u\  . .  ,uM,x  )  I  um  -  Ym(nm)  ,  n  6  N,  m 
n-1  ®n  n+1  n  n*  n  n  n  n 


For  such  problems  the  Nash  equilibrium  solution  can  be  computed  recursively, 
by  following  a  dynamic  programming  type  argument  and  by  solving  at  each  stage 
a  static  Nash  problem.  Assuming  that  each  fn(.)  and  g®  (n  €  N,  m  6  M)  is 
continuously  differentiable  in  its  arguments,  and  {9  ,  n  6  N  ^ is  an  indepen- 

r  m* 

dent  sequence,  the  recursive  relation  that  yields  the  Nash  solution  t  u  ■ 

*  _  _  n 
r(x);n€N,  m€M  }  reads  [c.f.  Baser  (1979a)]: 
n  n  T 

f/n  .  /  1  M  .n  _t0 .  *  1  M  v  ,  o  -fll.  *1  M  _  . 

'  )v  G  ,u  »..,u  ,  *  (*-4.1  »u„. • •  .U  ,X  )d<?  -  0 

^  m  n  n  n  n  on  nx±  ti  n  n  m  n  trrx  n  n  n  n 


G*<Vl-V-"VV  ■  (Cl<C2'Vl’--’Vl>:,n+lWn 


.  m,  1  M  . 


G®  =  0,  m  6  M,  n  6  N 
N 
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where 


n+1 


fa(xn’Un’ 


M 

u  )  +  0 


and  &  denotes  the  probability  measure  of  0^.  It  should  be  noted  that  "infor¬ 
mational  nonuniqueness"  is  absent  here,  mainly  because  of  our  assumption  on 
9  (n  €  N) ,  and  it  is  for  this  reason  that  every  solution  set  will  be  a  func¬ 
tion  of  only  the  current  value  of  the  state.  When  the  state  equation  is  lin¬ 
ear  and  each  cost  functional  is  quadratic,  a  unique  solution  can  be  obtained 
under  some  invertability  conditions  on  system  matrices,  and  the  Nash  control 
laws  are  affine  functions  of  the  current  values  of  the  state  (depending  only 
on  the  mean  value  of  0  )  [Bafar  (1979a)]. 

When  the  underlying  information  structure  is  quasi-classical,  deri¬ 
vation  of  the  Nash  equilibrium  solution  is  a  more  subtle  issue.  Firstly, 
Propositions  4.1  and  4.2  do  not  have  any  counterparts  here,  which  totally 
removes  the  possibility  of  simplifying  the  information  structure  (such  as 
reconsidering  the  original  problem  under  an  "equivalent"  static  information) . 
Secondly,  if  the  underlying  information  pattern  is  the  one-step  delay  infor¬ 
mation  sharing  pattern,  there  exists,  in  general,  a  plethora  of  "informa¬ 
tionally  nonunique"  Nash  equilibria,  because  that  particular  information 
pattern  incorporates  redundancy  in  dynamic  information  [each  agent  having 
access  to  past  measurements  as  well  as  to  past  control  values  of  the  other 
agents]. [See  Ba?ar  (1978a)  for  a  class  of  such  informationally  nonunique  Nash 
equilibria.]  In  order  to  avoid  informationally  nonunique  equilibria,  we  have 
to  restrict  our  attention  to  those  quasi-classical  information  patterns 
which  are  free  of  any  redundancy  in  dynamic  information  — such  as  the  one- 
step-delay  observation  sharing  pattern. 
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The  derivation  of  Nash  equilibria  under  the  one-step-delay  observa¬ 
tion  sharing  pattern  is  not  a  totally  Intractable  problem,  and,  depending  on 
the  structures  of  the  cost  functions,  measurement  equations  and  state  equa¬ 
tion,  one  can  utilize  a  partially  recursive  procedure  (of  the  dynamic  pro¬ 
gramming  type)  that  would  yield  the  optimal  solution.  This  procedure  (when¬ 
ever  it  works)  Involves,  at  each  stage,  the  solution  of  static  stochastic  Nash 
problems  and  satisfaction  of  some  consistency  conditions;  however,  as  a 
caveat  for  the  reader  we  should  mention  that  such  a  derivation  is  not  routine 
and  it  involves  several  pitfalls,  mainly  due  to  the  fact  that  the  conditional 
distribution  of  the  state  at  each  stage  (given  the  past  and  present  acquired 
Information)  depends  in  general  on  the  past  control  laws  [hence,  the  deriva¬ 
tions  at  each  stage  cannot  totally  be  isolated,  as  in  the  case  of  stochastic 
team  problems  discussed  in  §4.2]. 

Let  us  now  outline  this  procedure  in  some  general  terms,  by  pointing 
out  the  difficulties  as  they  arise.  Suppose  that  the  Nash  equilibrium  solu¬ 
tion  has  been  determined  up  to  the  last  stage,  and  we  are  faced  with  the 
"static''  last  stage  Nash  problem  which  has  the  cost  function 

JN  *  E  {8N^fN(xN,UN’“’UN)  +  9N,VV”,UN’  “N  " 

YN^nN^ »  1  €  M  } 

for  DM  m,  where  the  probability  distribution  of  depends  on  the  past  controls 
through  the  state  equation  (4.7).  Denote  the  Nash  solution  of  this  problem  by 


m,  m. 

W 


'W!]w--i5ri> 


9 


m  ®  M. 


(4.8) 


[Derivation  of  this  solution  will  in  general  be  quite  difficult;  however,  the 
difficulty  is  not  a  conceptual  one  but  rather  a  computational  one.  We  will 
discuss  this  point  further,  in  the  sequel,  for  the  special  case  of  linear- 


quadratic  problems.]  Here,  <p„  will  depend  on  the  conditional  probability 

N 

distribution  of  x^,  and  thereby  on  the  past  control  laws.  Now,  if  the  struc¬ 
tural  dependence  of  on  y™  depends  explicitly  on  the  past  control  laws, 
the  procedure  cannot  be  carried  over  to  the  next  stage,  since  y™  also  depends 
on  (u^  i  €  M  }and  therefore  the  general  structure  of  the  Nash  problem  at 
stage  N-l  will  depend  (implicitly)  on  the  solution  that  is  being  sought.  This 
difficulty  can,  however,  be  avoided,  if  (4.8)  happens  to  be  separable  in  y™, 
i.e. 


¥>m(vm  v 
N*yH»yN-l' 


*yl} 


W+"X-1— yi> 


(4.9) 


with  the  further  property  that  is  functionally  independent  of  the  past 

N 

control  laws.  In  such  a  case,  the  dependence  of  the  Nash  equilibrium  strat¬ 
egies  at  stage  N  on  the  controls  at  stage  N-l  [i.e.  (  m  €  M  }]  are 

completely  determined  by  the  functions  m  S  M  )],  and  therefore  we  can 

proceed  to  the  next  stage  (N-l)  for  determination  of  the  Nash  control  laws 
m* 

{yN-1,  m  €  M  )  by  substituting 


am 

N 


-m 

“n 


*m(ym) 

NvyN; 


» • • »y^) » 


m 


M, 


(4.10) 


m 


in  the  state  equation  and  the  cost  functionals,  where  is  any  measurable 
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function  of  its  arguments.  The  "static"  Nash  problem  (of  interest)  at  stage 
N-l  will  then  involve  the  cost  functionals 


jn-i  "  E  ^•i^n-i(Vi,uh,,,,Vi)  +  Vi’Vr,,,VrVi1 


+  8n^(5V'V***“n)  +  9N,5N,‘,uN’xN^  UK-l“YN-l(\-l)’ 
i  €  M  }  ,  m  €  M, 

where  u^*  is  given  by  (4.10),  ^  is  related  to  the  controls  at  stage  N-l  through 


_m 


*N  "  fN-l(XN-l’UN-l’,*’UN-l)  +  0N-1’ 


and  y  is  related  to  the  past  controls  through 


m  ,  m  amv 

yN'  WV‘ 


Now  suppose  that,  for  a  fixed  set  of  sub-strategies  at  stages  N-2,N-3, . . , 1, 
the  solution  of  this  Nash  problem  exists  and  is  given  by 

YN-1^N-1*  "  ^»-l(yN-l^  +  %-l(yN-2*’*,yl^  m  €  M» 

where  is  functionally  independent  of  the  past  control  laws,  but  it  may 

depend  on  (k^,  i  6  M  }  which  in  turn  depends  on  the  value  of  ^  at  equilib¬ 
rium  through  the  second  term  in  (4.9).  Invoking  the  consistency  condition,  and 
re-solving  tor  Yjj.^  from  (4.11),  we  obtain  the  structural  form 

OK  mO  AO  01  jjjfli  ■■ 

YN-l^\l-l^  ■  %-l(yN-P  +  ^-l^yN-2  ’  *  *  ,yl^ *  °  €  M* 
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ft®  i  — 

where  ^  does  not  depend  on  either  the  past  controls  or  {k^,  i  €  M}.  Hence, 

we  can  now  let 


m  _m 

Vi  "  Vi 


*m  m  ,  m 

%-l<W  +  Vl(j,N-2> 


>y{)> 


m  6  M, 


m 

where  is  any  measurable  function  of  its  arguments,  and  repeat  the  deeds 

of  stage  N-l  at  stage  N-2.  Then,  the  solution  can  be  obtained  Inductively  by 
Invoking  the  consistency  requirement  at  every  stage,  under  the  assumptions 
that  at  every  stage  a  Nash  equilibrium  solution  to  the  related  static  problem 
exists,  and  it  satisfies  a  separability  condition  of  the  type  (4.9)  or  (4.11). 

The  above  outlined  procedure  has  been  implemented  in  Ba$ar  (1978b) 
for  the  class  of  linear-quadratlc-GauSsian  (LQG)  systems  under  the  one-step- 
delay  observation  sharing  pattern,  and  existence  of  a  unique  Nash  solution, 
linear  in  the  available  information,  has  been  verified  under  some  sufficiency 
conditions  that  involve  the  system  parameters.  The  "static"  stochastic  Nash 
problem  to  be  solved  at  each  stage  is  of  the  linear-quadratic  type,  whose  solu¬ 
tion  is  discussed  in  Bapar  (1975)  and  Ba?ar  (1978a),  which  may  be  considered 
as  an  extension  of  Radnor's  result  [Radnor  (1962)]  referred  to  in  §4.2  to  pro¬ 
blems  with  different  objectives  for  different  agents.  We  should  mention  that 
the  solution  of  the  general  LQG  problem  given  in  B&far  (1978b)  is  highly  com¬ 
plicated  in  terms  of  the  equations  which  yield  the  coefficient  matrices  of 
the  linear  control  laws,  and  it  does  not  satisfy  any  separation  property  (as 
opposed  to  the  solution  of  the  LQG  team  problem  under  the  Fame  information 
pattern) . 

When  the  underlying  information  structure  is  nonclassical,  derivation 
of  the  Nash  equilibrium  solution  is  in  general  not  tractable,  since  even  the 


special  case  of  nonclassical  team  problems  involve  formidable  difficulties, 
as  discussed  earlier  in  §4.2.  However,  there  exists  a  subclass  of  problems 
with  totally  conflicting  goals,  whose  Nash  equilibrium  solutions  (rather 
called  saddle-point  solutions  in  this  context)  can  be  obtained  explicitly 
(and  analytically)  even  under  nonclassical  information  patterns,  mainly  be¬ 
cause  in  such  problems  controls  of  the  agents  do  not  have  "triple"  role  (i.e. 
the  signaling  aspect  is  absent).  For  example,  Witsenhausen's  counter  example 
[Witsenhausen  (1968)],  when  cast  in  such  a  framework,  admits  unique  Nash 
(saddle-point)  equilibrium  that  is  linear  in  the  available  nonclassical  in¬ 
formation  [see,  Bayar  and  Mintz  (1972)].  For  more  discussion  on  such  solv¬ 
able  stochastic  problems  with  nonclassical  information  patterns,  see  Ba?ar 
and  Mintz  (1971,  1973). 

4.4.  Hierarchical  Decision  Structure 

In  this  subsection,  we  discuss  the  problem  of  optimal  control  and 
coordination  of  stochastic  systems  under  hierarchical  decision  structure,  by 
employing  the  Stackelberg  solution  concept  introduced  in  §2.2  and  elaborated 
on  in  §3.4  for  deterministic  systems.  Let  us  first  direct  our  attention  to 
the  case  of  two  agents  with  different  goals,  and  with  DM  1  (called  the  leader) 
being  in  a  position  to  enforce  his  strategy  on  DM2  (known  as  the  follower). 

Information  structure  again  plays  a  crucial  role  in  such  problems, 
and  solvability  of  a  specific  problem  depends  to  a  great  extent  on  the  nature 
of  the  underlying  Information  pattern.  We  should  mention,  at  the  outset,  that 
stochastic  decision  problems  in  which  the  leader  has  access  to  static  or 
dynastic  redundant  information  (such  as  the  one-step  delay  information  sharing 
pattern)  are  much  more  tractable  as  compared  with  those  in  which  the  leader 
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has  dynamic  (non-redundant)  information  (such  as  the  one-step  delay  observa¬ 
tion  sharing  pattern)  --this  latter  class  of  problems  is  in  fact  extremely 
challenging  and  as  to  date  no  general  method  exists  that  would  aid  in  their 
solution. 

Static  information 

When  the  leader  has  access  to  static  information  [more  precisely, 
if  the  leader's  information  does  not  depend  on  the  controls  of  the  follower], 
the  stochastic  Stackelberg  problem  is  tractable  because  the  rational  response 
set  of  the  follower  does  not  structurally  depend  on  the  strategy  of  the  leader. 
Such  problems  are  then  essentially  equivalent  to  one-stage  stochastic  Stackel¬ 
berg  problems^which  we  now  discuss.  In  terms  of  the  standard  notation,  let 

Jm  -  E  lgm(ul,u2,5)|  u1  -  TT1^1),  i  -  1,2  }  ,  m  -  1,2, 

where 

Tl1  -  {y1}  ,  y1  -  h1^),  1  -  1.2, 

and  §  denotes  a  collection  of  primitive  random  variables  with  known  probability 
distributions.  Let  rr^  €  n*  be  fixed,  where  n*  is  appropriately  defined.  Then, 
the  follower  is  faced  with  the  stochastic  minimization  problem 

min  E  {gVl(hl<§)),  TT2(y2),g]} 
en2 

■  min  E  [g2[TT1(hl(g)),  u2,S]|  y2  }, 
u2 

^When  the  follower  has  access  to  dynamic  Information,  there  is  no  loss  of 
generality  in  replacing  it  with  an  appropriate  static  information. 


(4.12) 
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whose  structure  does  not  depend  on  the  choice  of  since  tt^  does  not  carry 

2  2  2 
u  in  its  argument.  If  g  is  strictly  convex  in  u  ,  this  minimization  pro¬ 
blem  admits  a  unique  solution  [regardless  of  the  choice  of  tt*]  which  we  denote 
by  T:  — •  11^ ,  so  that  tt^  -  T  tt*  uniquely  solves  (4.12).  The  Stackelberg 

it 

strategy  n1*  is  then  any  solution  of  the  stochastic  minimization  problem 


min  E  {gl[TTl(y1>,  Trr^yS.Sl) 


i^en1 


(4.13) 


min  E  {g1[u1,Tu1,  SI  |yL  }  . 
u 


The  two  optimization  problems  (4.12)  and  (4.13)  can  be  solved  (at  least  numer¬ 
ically)  without  any  major  difficulty  of  conceptual  or  methodological  nature, 

and  in  a  few  cases  the  solution  can  be  obtained  analytically,  one  such  specific 

12  1 

case  is  the  class  of  linear-quadratic-Gausslan  systems  [g  and  g  quadratic, h 
2 

and  h  linear,  and  S  Gaussian] ,  for  which  the  Stackelberg  solution  is  affine. 
More  precisely,  we  have  from  Ba?ar  (1979a,  1980a). 

Proposition  4.3. 

1  2 

Let  S  ■  (x,0  ,0  )  be  Gaussian  distributed  with  mean  zero  and  covari- 
1  2 

ance  dlag(£, A  ,  A  ) .  Further  let 


m  1  2  l  m*  m  m1  i  l  i '  i  m' 

g  (x,u  ,u  )  -  j u  DjujU  +  u  D^u  +  ju  D^u  +  u  Cj 


+  u  Cmix;  m,i  ■  1,2;  i  *  m,  >  0, 


h^x,©1®)  •  HBx  +  0°  ,  m  ■  1,2. 


Then,  the  stochastic  Stackelberg  problem  with  static  information 
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7]m  •  {ym},  m  ■  1,2,  admits  the  unique  solution 


-  Ay1;  y^y2)  -  +  ^^Iy1  (/>  I  y2]] 


where  A  Is  the  unique  solution  of  the  Lyapunov-type  equation 


D11A  +  *D21D221>13D221)2l  "  D12D2^21  ’  D21D221>{2^  A  Z2H  Z1 
-  [(D12  -  D'1D221D13)D-21C22  f  W2lcuV2*\  -  CUS1 


rm  a  n Im,(H,nmm,  +  a®)'1,  m  -  1,2, 


provided  that  the  condition 


0  <  I  +Du/2r^lD-21DnD-21D21  -  DuD-21D21  -  d’J/2  < 


holds.  □ 

Remark,  the  preceding  result  may  be  considered  as  an  extension  of  Radner's 
result  on  LQG  teams,  cited  In  §4.2,  to  problems  with  different  objective 
functionals  for  the  agents  and  with  a  hierarchical  decision  structure.  Even 
though  this  specific  result  pertains  to  the  two-agent  case,  its  extensions 
to  the  multi-agent  case  with  more  than  two  levels  of  hierarchy  (In  decision 
making)  can  be  envisioned  —  such  problems  (when  cast  In  the  LQG  framework) 
also  admit  unique  affine  solutions,  but  the  verification  and  the  derivation 
are  such  more  complicated  than  in  the  two-agent  case  [Ba?ar  (1981c)].  Yet 
another  extension  (and  application)  of  Proposition  4.3  would  be  to  dynamic 


decision  problems  under  the  feedback  (stagewise)  Stackelberg  solution  concept, 
in  which  case  the  leader  enforces  his  strategy  on  the  follower  only  stagewise. 
For  LQG  dynamic  systems  and  under  the  one -step -delay  observation  sharing  pat¬ 
tern,  it  can  be  shown  by  repeated  application  of  Proposition  4.3,  together 
with  a  dynamic  programming  type  argument,  that  the  feedback  Stackelberg  solu¬ 
tion  is  affine  in  the  information  available  to  the  two  agents  [see,  Ba?ar 
(1979)].  □ 

Dynamic  redundant  information 

We  have  earlier  discussed  in  §4.3  that  presence  of  redundancy  in  the 
dynamic  information  structure  gives  rise  to  ill-posed  problems  in  the  case  of 
Nash  equilibria,  because  it  leads  to  a  plethora  of  informationally  nonunique 
solutions.  For  problems  with  hierarchical  decision  structure,  however,  the 
situation  is  quite  the  opposite.  This  time,  presence  of  redundancy  in  the 
dynamic  information  actually  helps  to  simplify  the  derivation  of  the  Stackel¬ 
berg  solution,  because  the  extra  freedom  allotted  to  the  leader  through  the 
redundancy  enables  him  to  provide  Incentives  or  implement  threats  for  the 
follower,  so  as  to  force  him  to  the  most  favorable  solution  [from  the  leader's 
point  of  view] .  We  have  already  elucidated  this  property  of  redundant  dynamic 
information  in  §3.4  for  deterministic  systems,  and  in  the  following  we  discuss 
it  for  stochastic  systems  within  the  context  of  a  specific  model. 

Consider  the  general  two-agent  decision  problem  treated  earlier  in 
this  subsection,  but  under  the  amended  information  structure 

-1  r  1  2  2-i  _2  f  2, 

Tl  -  ly  ,y  ,U  },  71  -  ly  }, 


that  is,  the  leader  has  also  access  to  the  measurement  and  control  value  of 
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the  follower.  [Of  course,  this  makes  sense  only  if  the  follower  acts  before 
the  leader  does,  which  we  assume  to  be  valid  in  this  case].  Now  let 


min  E  (g1  [§ .n1  (y^y2),  n2(y2)]} 

it1  €  n1,  it2  e  n2 

exist  and  be  determined  uniquely  by 

u1  »  tt1  (y^y2),  u2  -  tt2  (y2). 

2  1 1  i  i 

Let  E  [g  ,u  )}  *  g  ,  and  let  there  exist  a  51  6  Hl  such  that 


min  E  U2[S,W1(yl,y2,u2),u2]|  u2  -  TT2(y2)}  >  g2*. 


ff2€n2 


Then,  by  announcing  the  strategy 


u1*  if  u2  -  nzt(y2) 


otherwise 


the  leader  can  force  the  follower  to  adopt  the  strategy  tt*  ,  and  thereby  incur 

* 

an  overall  favorable  cost  value.  We  can  therefore  declare  tt1  as  a  Stackelberg 

strategy  for  the  leader  and  consider  the  problem  solved.  However,  for  several 

1* 

reasons,  one  may  wish  to  replace  the  essential  threat  tt  with  a  "softer"  in¬ 
centive  scheme  which  penalizes  the  follower  proportionately  to  his  deviation 

from  the  desired  solution.  Such  incentive  schemes  (which  are  basically  dif- 

l* 

ferent  representations  of  tt'*  )  do  exist,  and  for  several  discussions  and 


derivations,  as  well  as  on  extensions  of  this  approach,  we  refer  to  Ba?ar 
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(1980a),  and  also  Co  Ho,  Luh  and  Mulidharan  (1981).  Extensions  Co  Che  case  of 
multi-levels  of  hierarchy  are  discussed  in  Bafar  (1981a). 


Non- redundant  dynamic  information 

For  the  procedure  outlined  above  to  work,  the  information  structure 

“1  2 

of  the  leader  should  be  such  that  if,  at  stage  n  6  N,  T|£  depends  on  u^  for  some 

k  <  n  [either  directly  or  through  the  measurement  equation] ,  then  he  should 

2  2 

know  both  the  value  of  u^  and  the  information  on  which  it  is  based.  With 
such  an  information  structure,  which  incorporates  redundancy,  the  leader  can, 
in  general,  enforce  the  solution  that  is  most  favorable  to  him.  If  the  infor¬ 
mation  structure  is  dynamic,  but  does  not  incorporate  any  redundancy,  the 
Stackelberg  solution  is  extremely  difficult  to  obtain,  unless  one  parameterizes 
the  desired  solution  and  converts  the  original  dynamic  optimization  problem  to 
a  static  one  (over  those  parameter  values).  Such  an  approach,  of  course,  leads 
in  general  to  suboptlmal  Stackelberg  solutions.  Even  for  linear-quadratic  sto¬ 
chastic  systems  with  perfect  state  measurements,  there  is  no  known  method  to 
obtain  the  closed-loop  Stackelberg  solution,  and  the  linear  suboptlmal  solu¬ 
tion  can  only  be  obtained  numerically,  with  the  coefficient  matrices  depending 
on  the  statistical  parameters  of  the  additive  system  noise  [see,  Baper  (1979a)]. 

The  following  table  [Table  l]  now  recapitulates.  In  a  nutshell,  the 
known  results  and  the  yet-unsolved  problems  in  the  control  and  coordination 
of  stochastic  systems  with  multiple  decision  makers  and  under  different  types 
of  Information,  together  with  related  references.  We  have  classified  the  pro¬ 
blems  in  four  categories. 


(1)  Completely  solved  ones  —remaining  details  are  of  minor  nature. 

(2)  Not  completely  solved.  Any  new  result  on  this  class  of  problems 


will  constitute  a  contribution  to  the  field,  but  a  totally  innovative  approach 
is  not  required. 

(3)  Some  "positive"  and  "negative"  partial  results  on  special  cases 
exist;  but  this  general  class  of  problems  is  extremely  challenging,  and  innova¬ 
tive  approaches  have  to  be  introduced  in  order  to  solve  a  sufficiently  general 
class  of  such  problems. 

(4)  These  problems  are  ill-posed,  mainly  because  they  lead  to  a 
plethora  of  solutions  which  cannot  be  strictly  ordered. 

The  references  quoted  in  the  Table  are  not  meant  to  be  exhaustive;  we 
have  chosen  to  list  the  most  recent  or  the  most  representative  ones  in  each  sub' 


category 


n 

I 


m 


i 

t. 
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J 


TABLE  1:  A  display  of  the  current  "state  of  Knowledge"  on 
the  control  and  optimization  of  discrete-time 
stochastic  systems. 


5 .  APPLICATIONS 


In  this  section  we  consider  a  few  simplified  situations  where  the 
|  concepts  of  multiperson  decision-making  are  meaningful.  The  examples  are 

intended  to  suggest  potential  areas  where  the  concepts  may  be  used  as  guides 
in  decision-making. 

\ 

5.1.  Nash  Equilibrium  Model  of  an  Arms  Pace 

Richardson's  model  [Richardson  (I960)]  of  arms  race  between  two 

’  nations : 

Xx(t)  -  ox2(t) -aXjU)  +  g  (5.1] 

*  x2(t)  -  px^t)  - yx2(t)  +  h  (5.2; 

has  generated  some  Interest  in  political  science  in  further  exploration  of 
mathematical  models  in  international  relations.  The  arms  levels  (at  time  t) 

^  of  two  nations  are  represented  by  x^(t)  and  x2(t) ,  a  and  p  are  called 

defense  coefficients,  a  and  y  are  called  fatigue  coefficients,  and  g  and  h 
are  grievance  coefficients.  Discretizing  time,  the  model  may  be  represented 
I  in  multistage  form  as 

x1(k+l)  -  a12(k)x2(k)  +  au(k)x1(k)  +  bx(k)  (5.31 

x2 (k+1)  ■  a21(k)x1(k)  +  a22(k)x2(k)  +  b2(k).  (5.4] 

In  an  attempt  to  model  how  the  coefficients  a^ (k)  might  evolve  and  to 
attempt  to  explain  how  the  nations'  decision  processes  might  lead  to  the 
model  in  (5.1)  and  (5.2),  Slmaan  and  Cruz  [Simaan  and  Cruz  (1975a)]  proposed 
a  Nash  equilibrium  model  for  the  following  multiperson  decision  problem: 


The  fundamental  model  for  the  arms  levels  is  given  by  the  pair  of  equations : 


x^k+1)  -  8^00  +  Zx(k)  (5.5) 

x2(k+l)  -  S2x2(k)  +  Z2(k)  (5.6) 

where  8^x^(k)  and  $2x2(k)  are  the  depreciated  values  of  the  arms  stocks  at 
stage  k+1,  and  Z^(k)  and  Z2(k)  are  investments  in  arms.  We  seek  strategies 
which  are  feedback  Nash  equilibrium  strategies  with  respect  to  some  objective 
functions.  Thus  Z^(k)  and  Z2(k)  will  be  functions  of  the  current  arms  levels 
x^(k)  and  x2(k).  The  objective  functions  are  modeled  to  be 

Ji(Z1,Z2)'  -  \  Q1(N+1)(x1(N+1)  -  P1(Nfl)xj(N+l)  -  Vi(N+l))2 

1  N  2 
+  f  I  {R.(k)(Z.(k)-W.(k)r 

2  k-1  1  *  * 

+  Q±(k)(x1(k)-P1(k)xj(k)-V1(k))2},  i *  1,2,  (5.7) 

where  R^(k)  and  R2(k)  are  strictly  positive  real  numbers  and  Q^(k) ,  Q2(k), 

P  (k) ,  and  P2(k)  are  nonnegative  real  numbers  for  each  k.  Thus  each  nation 

wishes  to  narrow  the  gap  between  its  armament  level  and  an  affine  function  of 

its  opponent's  armament  level, while  at  the  same  time  minimizing  its  armament 
expenditures . 

Using  dynamic  programing  the  feedback  Nash  equilibrium  solutions 
are  found  to  be 

Z^(n)  -  *u(n)x1(n)  +  Al2(n)x2(n)  +  B^(n)  (5.8) 

Z2(n)  -  A21(n)x2(n)  +  A22(n)x2(n)  +  B2(n)  (5.9) 

where  A^(n)  and  B^(n)  satisfy  some  recursive  equations.  When  substituted  in 
(5.5)  and  (5.6),  the  final  feedback  Nash  equilibrium  model  is  given  by 
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x1(lcfl)  -  (g1  +  Au(k))x1(k)+Al2(k)x2(k) +Bx(k)  (5.10) 

x2(k+l)  -  (62  +  A22(k))x2(k)+A21(k)x1(k)+B2(k).  (5.11) 

Thus  Che  coefficients  In  Che  discrete-time  Richardson  model  of  (5.3)  and  (5.4) 
may  be  related  to  the  depreciation  coefficients  in  (5.5)  and  (5.6),  and  to 
the  coefficients  of  the  objective  functions  in  (5.7)  associated  with  a  multi¬ 
person  decision  problem.  Thus  the  modeling  problem  is  shifted  to  a  choice  of 
weighting  coefficients  in  the  ob j ectlve  functions  of  (5.7).  For  more  details 
see  Simaan  and  Cruz  (1975a) .  An  outline  for  obtaining  the  feedback  Stackelberg 
solution  for  this  arms  race  problem  is  given  in  Simaan  and  Cruz  (1976)  . 


5.2.  Dynamic  Duopoly  with  Production  Constraints 

In  Simaan  and  Takayama  (1978) ,  a  dynamic  duopoly  model  with  a 
linear  demand  of  the  form 

p  -  C-ap-b(x1  +  x2)  (5.12) 

where  p  is  the  commodity  price  and  x^  is  the  output  of  firm  i.  The  cost  of 
production  is 

8i(xi)  - 1  Vi'  (5.13) 

and  the  total  profit  for  firm  1  over  the  horizon  T  is 

T 

II1(x1,x2)  -  /  exp(-rt)  [px^  -  j  a^x^Jdt  (5.14) 

o 

for  1*1,2.  The  productions  x^  are  to  be  chosen  as  functions  of  the  instan¬ 
taneous  price  p(t)  and  it  is  assumed  that  the  production  capacity  constraints 

are 


(5.15) 


0  <  J^tt.pCt)]  <  i-1,2. 

Open-loop  and  feedback  Nash  equilibrium  solutions  are  Investigated  in 
Slmaan  and  Takayaoa  (1978) ,  where  nine  possibilities  are  explored, 
depending  on  whether  firm  i  is  not  producing,  producing  at  maximum  capacity, 
acting  as  a  monopolist,  or  playing  as  a  true  duopolist.  For  more  details, 
see  Slmaan  and  Takayama  (1978) . 


5.3.  Electricity  Pricing 

Consider  a  simple  model  for  electricity  pricing,  where  the  consumer 
chooses  a  level  of  consumption  q  to  maximize  his  "consumer  surplus”  which  is 
affected  by  the  price  of  electricity.  The  electric  utility  chooses  the 
revenue  function  r(q)  to  maximize  its  profit  subject  to  capacity  and  subject 
to  regulation.  Such  a  problem  was  considered  as  a  Stackelberg  problem, with 
the  utility  as  leader  and  the  consumer  as  follower,  by  Ho,  Luh,  and 
Muralidharan  (1981) .  Let  the  consumer  surplus  be  modeled  by 

JF  "  2  S[^2“  -  r<q>  (5-16) 

where  S  and  q  are  positive  constants,  r(q)  is  a  monotonic  increasing  piece- 
wise  linear  function  representing  cost  to  the  consumer  (revenue  to  the  utility) 
The  profit  of  the  utility  is 

JL  -  r(q)  -y  cq2,  (5.17) 

the  capacity  constraint  is 

q  <  q,  (5.18) 


and  the  regulation  constraint  is 
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JL  -  kq 


(5.19) 


where  c,  q,  and  k  are  positive  constants.  The  Information  structure  is 


no  information 

q* 


Ho,  Luh,  and  Muralidharan  (1981)  determined  that 

r(q)  ■  pq  +  F  (5.20) 

is  a  Stackelberg  strategy,  where 

p  ”  S(q-q)  >  0  (5.21) 

F  -  kq  +  y  cq2  -  Sq(q-q) .  (5.22) 

The  solution  in  (5.20),  (5.21),  and  (5.22)  has  the  property  that  JL  is 
maximized  with  respect  to  r  and  q.  Furthermore,  with  r(q)  given  as  in  (5.20), 
the  optimum  value  of  q  for  the  consumer  is  q,  the  capacity  of  the  utility. 

The  resulting  value  of  the  utility  profit,  JT,ls  kq,  which  is  the  maximum 
allowed  by  regulation. 
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6.  CONCLUDING  REMARKS 

In  this  chapter  we  discussed  some  key  concepts  and  methods  relevant 
to  multi-person  decision-making  and  optimization  in  dynamic  systems.  In  large 
scale  physical  models,  dynamic  operations  research  models,  and  policy  and 
planning  models,  it  is  important  and  crucial  to  explicitly  model  the  roles 
of  multiple  decision  makers  if,  indeed, there  is  more  than  one  entity  that 
makes  choices .  For  certain  purposes,  such  as  in  policy  analysis,  it  may  be 
adequate  to  recognize  only  one  decision  maker  and  subsume  other  decision¬ 
making  aspects  in  general  sectors.  However,  in  the  investigation  of  effects 
of  significant  policy  changes,  based  on  a  model  calibrated  from  data  on 
previous  policies,  the  predicted  outcome  may  be  misleading  because  when  the 
policy  is  changed,  the  reactions  of  the  subsumed  decision  makers  may  change 
so  that  the  fixed  model  being  used  may  not  be  satisfactory  anymore.  It  would 
be  preferable  to  explicitly  model  the  presence  of  the  other  decision  makers. 

For  situations  where  cooperation  among  decision-makers  is  desir¬ 
able,  the  concept  of  Pareto  optimality  is  appropriate.  However,  in  non- 
cooperative  situations  the  Nash  equilibrium  concept  is  more  natural.  Hier¬ 
archies  in  decision-making  lead  to  the  concept  of  Stachelberg  or  leader- 
follower  strategies.  These  concepts  are  described  in  this  chapter  for  both 
deterministic  and  stochastic  systems. 

A  critical  consideration  in  multi-person  optimization  problems  is 
the  Information  structure.  In  contrast  to  single  person  decision  making 
which  necessarily  Involves  centralized  information,  the  multi-person  decision¬ 
making  problem  may  involve  decentralized  information.  Furthermore,  the 


assumption  of  memory  in  the  measurement,  even  in  the  deterministic  case,  gen¬ 
erally  leads  to  a  solution  different  from  that  with  no-memory  in  the  multi¬ 
person  case.  In  contrast,  memory  in  the  measurement  has  no  effect  on  the 
optimal  solution  for  single  person  optimization  problems. 

For  simplicity  in  exposition,  only  the  class  of  discrete-time  dynam¬ 
ic  systems  is  treated.  The  concepts  discussed  in  the  chapter  are  also  applica¬ 
ble  to  continuous -time  dynamic  systems. 
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so  that  the  fixed  model  being  used  may  not  be  satisfactory  anymore.  It  would 
be  preferable  to  explicitly  model  the  presence  of  the  other  decision  makers. 
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able,  the  concept  of  Pareto  optimality  is  appropriate.  However,  in  non- 
cooperative  situations  the  Nash  equilibrium  concept  is  more  natural.  Hier¬ 
archies  in  decision-making  lead  to  the  concept  of  Stachelberg  or  leader- 
follower  strategies.  These  concepts  are  described  in  this  chapter  for  both 
deterministic  and  stochastic  systems. 

A  critical  consideration  in  multi-person  optimization  problems  is 
the  information  structure.  In  contrast  to  single  person  decision  making 
which  necessarily  involves  centralized  information,  the  multi-person  decision¬ 
making  problem  may  involve  decentralized  information.  Furthermore,  the 


